Purdue University Graduate School
Browse

Asymmetry Learning for Out-of-distribution Tasks

Download (2.68 MB)
thesis
posted on 2024-05-02, 17:45 authored by Chandra Mouli SekarChandra Mouli Sekar

Despite their astonishing capacity to fit data, neural networks have difficulties extrapolating beyond training data distribution. When the out-of-distribution prediction task is formalized as a counterfactual query on a causal model, the reason for their extrapolation failure is clear: neural networks learn spurious correlations in the training data rather than features that are causally related to the target label. This thesis proposes to perform a causal search over a known family of causal models to learn robust (maximally invariant) predictors for single- and multiple-environment extrapolation tasks.

First, I formalize the out-of-distribution task as a counterfactual query over a structural causal model. For single-environment extrapolation, I argue that symmetries of the input data are valuable for training neural networks that can extrapolate. I introduce Asymmetry learning, a new learning paradigm that is guided by the hypothesis that all (known) symmetries are mandatory even without evidence in training, unless the learner deems it inconsistent with the training data. Asymmetry learning performs a causal model search to find the simplest causal model defining a causal connection between the target labels and the symmetry transformations that affect the label. My experiments on a variety of out-of-distribution tasks on images and sequences show that proposed methods extrapolate much better than the standard neural networks.

Then, I consider multiple-environment out-of-distribution tasks in dynamical system forecasting that arise due to shifts in initial conditions or parameters of the dynamical system. I identify key OOD challenges in the existing deep learning and physics-informed machine learning (PIML) methods for these tasks. To mitigate these drawbacks, I combine meta-learning and causal structure discovery over a family of given structural causal models to learn the underlying dynamical system. In three simulated forecasting tasks, I show that the proposed approach is 2x to 28x more robust than the baselines.

Funding

CAREER IIS-1943364

CCF-1918483

CNS-2212160

Wabash Heartland Innovation Network (WHIN)

Amazon Research Award

Ford

Nvidia

CISCO

AnalytiXIN

Amazon

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Bruno Ribeiro

Additional Committee Member 2

David Gleich

Additional Committee Member 3

Yexiang Xue

Additional Committee Member 4

Christopher Clifton

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC