Purdue_University_Thesis_Vachik_Dave.pdf (5.32 MB)

Download file# SOLVING TIME PREDICTION PROBLEMS IN NETWORKS USING GRAPHLETS AND EMBEDDING BASED LOCAL FEATURES

Real-world networks are inherently dynamic; vertices and edges of these networks appear or disappear in a systematic pattern over time. Hence, the exact time at which the network elements, such as, the vertices or the edges appear or disappear is important information for modeling such networks. Additionally, as we predict a future event (say, link generation) on the network, it is also important to predict the exact time of that future event, because the availability of the event time makes the event prediction more valuable in terms of real-life utility of that prediction. Unfortunately, existing works on dynamic networks do not consider the time value; neither do they use the time information for modeling the network nor do they predict the time of a future event.

In this thesis, I have solved the event time prediction variant of multiple well-known problems in social networks. For instance, link prediction is a well known and possibly the most-studied problem in network analysis. But, the existing solutions to this task only predict whether a future link will appear or not—on some occasions, with a probability value. Unlike the existing works, in my thesis, I have proposed machine learning solutions that answer the question, “when will a link appear?”, instead of answering whether a link will appear. I have also developed methods to use the time value of a link for the network modeling task, specifically, for learning features of a network element (a vertex, or an edge), which can subsequently be used for predicting the time value of a future event in the network. As I solve the time prediction problem in the network, I target to predict the link creation time in both directed and undirected networks. To put it succinctly, this thesis opens up a new dimension in the network analysis task, where the time of the event has been considered explicitly both in the modeling and also in the prediction. The specific time prediction problems that I have designed are discussed below.

The first problem is Reciprocal Link Time Prediction (RLTP) problem, which is designed to predict the creation times of reciprocal links. RLTP is a versatile tool which can be applied to many real-world applications. For example, RLTP can be used to predict the elapsed time between receiving an email and sending its reply, or it can be used to determine the follow-back time or friend request acceptance time in online social networks. The second problem is Triangle Completion Time Prediction (TCTP) problem, which is designed to predict the creation time of a link that completes one or more triangles. A triangle is a prominent and basic building block of social networks and it is shown by researchers that the majority of new links created in social networks complete a triangle(s) in the network. Hence, a good solution of this problem can effectively improve the performance of various network analysis problems such as the link prediction problem, network expansion study, network generation models, community structure generation. Also, a solution of this problem provides an ordering of the future links based on their creation time, which is very useful to rank the user recommendations in different domains. Lastly, time prediction is the main theme of my research, but the machine learning solutions to such prediction problems require effective and efficient feature generation schemes, which can scale to large, and complex networks. So, some of my published works, which are also part of this thesis, focused on building effective features for network time prediction.