Purdue University Graduate School
Browse

Graph Learning at Scale: Algorithms, Systems, and Applications

thesis
posted on 2025-03-07, 18:24 authored by Haoteng YinHaoteng Yin

Graph-structured data capture complex relationships and interactions between entities, offering valuable insights for scientific discovery, business modeling, and AI-driven decision-making. Despite its transformative potential, learning on graphs faces two key challenges: (1) scaling expressive learning approaches, especially subgraph-based graph representation learning, and (2) ensuring privacy when handling sensitive relational data. Both challenges arise from intricate dependencies in graph structures, which limit the effectiveness of canonical algorithms and system optimizations. This dissertation addresses these challenges through a unified framework that integrates system-aware algorithm design across two main thrusts.

In Thrust I, we develop a family of efficient frameworks for expressive graph representation learning that eliminate redundancy in subgraph-based methods. By decoupling dependencies over task-specific input features (i.e., query-induced subgraphs), the proposed paradigm enables efficient higher-order pattern discovery, scalable network analysis on billion-edge graphs, and low-latency online inference using reusable, task-agnostic features derived from random walks, node-set sampling, and neighborhood hashing. In Thrust II, we extend the design principle to privacy-preserving relational learning, where structural dependencies in graphs often violate the gradient decoupling assumption in standard privacy learning mechanisms like differentially private stochastic gradient descent (DP-SGD). We propose the first differential private relational learning framework that disentangles sample dependencies through a tailored DP-SGD approach. This framework enables the private fine-tuning of large language models (LLMs) on sensitive graph data, effectively addressing associated computational complexities while achieving strong privacy-utility trade-offs.

By co-designing learning algorithms and system implementations, this dissertation demonstrates how graph-based AI can be both scalable and trustworthy, opening new avenues for learning from complex structured data in real-world applications.

Funding

NSF PHY-2117997

NSF CCF-2402816

NSF IIS-2239565

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Pan Li

Advisor/Supervisor/Committee co-chair

David Gleich

Additional Committee Member 2

Ananth Y. Grama

Additional Committee Member 3

Jianguo Wang