Purdue University Graduate School
Browse
- No file added yet -

Robust Representation Learning for Out-of-Distribution Extrapolation in Relational Data

Download (1.76 MB)
thesis
posted on 2024-04-17, 12:47 authored by Yangze ZhouYangze Zhou

Recent advancements in representation learning have significantly enhanced the analysis of relational data across various domains, including social networks, bioinformatics, and recommendation systems. In general, these methods assume that the training and test datasets come from the same distribution, an assumption that often fails in real-world scenarios due to evolving data, privacy constraints, and limited resources. The task of out-of-distribution (OOD) extrapolation emerges when the distribution of test data differs from that of the training data, presenting a significant, yet unresolved challenge within the field. This dissertation focuses on developing robust representations for effective OOD extrapolation, specifically targeting relational data types like graphs and sets. For successful OOD extrapolation, it's essential to first acquire a representation that is adequately expressive for tasks within the distribution. In the first work, we introduce Set Twister, a permutation-invariant set representation that generalizes and enhances the theoretical expressiveness of DeepSets, a simple and widely used permutation-invariant representation for set data, allowing it to capture higher-order dependencies. We showcase its implementation simplicity and computational efficiency, as well as its competitive performances with more complex state-of-the-art graph representations in several graph node classification tasks. Secondly, we address OOD scenarios in graph classification and link prediction tasks, particularly when faced with varying graph sizes. Under causal model assumptions, we derive approximately invariant graph representations that improve extrapolation in OOD graph classification task. Furthermore, we provide the first theoretical study of the capability of graph neural networks for inductive OOD link prediction and present a novel representation model that produces structural pairwise embeddings, maintaining predictive accuracy for OOD link prediction as the test graph size increases. Finally, we investigate the impact of environmental data as a confounder between input and target variables, proposing a novel approach utilizing an auxiliary dataset to mitigate distribution shifts. This comprehensive study not only advances our understanding of representation learning in OOD contexts but also highlights potential pathways for future research in enhancing model robustness across diverse applications.

History

Degree Type

  • Doctor of Philosophy

Department

  • Statistics

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Bruno Ribeiro

Advisor/Supervisor/Committee co-chair

Vinayak Rao

Additional Committee Member 2

Nianqiao Ju

Additional Committee Member 3

Xiao Wang

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC