Graph representation learning and Graph Neural Network (GNNs) models provide flexible tools for modeling and representing relational data (graphs) in various application domains. Specifically, node embedding methods provide continuous representations for vertices that has proved to be quite useful for prediction tasks, and Graph Neural Networks (GNNs) have recently been used for semi-supervised node and graph classification tasks with great success.
However, most node embedding methods for unsupervised tasks consider a simple, sparse graph, and are mostly optimized to encode aspects of the network structure (typically local connectivity) with random walks. And GNNs model dependencies among the attributes of nearby neighboring nodes rather than dependencies among observed node labels, which makes it not expressive enough for semi-supervised node classification tasks.
This thesis will investigate methods to address these limitations, including:
(1) For heterogeneous graphs: Development of a method for dense(r), heterogeneous graphs that incorporates global statistics into the negative sampling procedure with applications in recommendation tasks;
(2) For capturing long-range role equivalence: Formalized notions of representation-based equivalence w.r.t regular/automorphic equivalence in a single graph or multiple graph samples, which is employed in a embedding-based models to capture long-range equivalence patterns that reflect topological roles;
(3) For collective classification: Since GNNs model dependencies among the attributes of nearby neighboring nodes rather than dependencies among observed node labels, we develop an add-on collective learning framework to GNNs that provably boosts their expressiveness for node classification tasks, beyond that of an {\em optimal} WL-GNN, utilizing self-supervised learning and Monte Carlo sampled embeddings to incorporate node labels during inductive learning for semi-supervised node classification tasks.