Purdue University Graduate School
Browse

Causal Data Science: Estimating Identifiable Causal Effects

thesis
posted on 2025-06-12, 12:30 authored by Yonghan JungYonghan Jung

Causal inference is central to scientific inquiry and decision-making across numerous disciplines, including the social sciences, economics, biology, and medicine. Drawing causal conclusions from data fundamentally involves two primary tasks: first, causal effect identification, which is determining if a causal effect can be computed from available data and assumptions about the data-generating process (often encoded in a causal graph); and second, causal effect estimation, which involves quantifying this effect from finite samples. While the theory of causal effect identification is now well-established, with comprehensive graphical and algorithmic solutions for determining when interventional distributions can be uniquely recovered, a significant challenge persists in terms of the practical estimation of these effects. This is particularly true when the identified causal estimand is complex, moving beyond standard scenarios like simple covariate adjustments, or when data are drawn from heterogeneous sources. Consequently, a gap remains between our theoretical understanding of what causal effects are identifiable and our ability to reliably and efficiently quantify them in practice.

This dissertation addresses this critical gap by providing a unified and systematic approach to the estimation of identifiable causal effects. It offers a comprehensive framework designed to empower computer scientists, statisticians, and data scientists with robust and sample-efficient tools. Key to this framework is the ability to represent diverse identified estimands—whether arising from purely observational data or from a fusion of observational and experimental sources—as a composition of more fundamental adjustment-like quantities.

Leveraging this structured representation, the dissertation develops doubly robust and debiased machine learning estimators for a broad spectrum of interventional distributions. These estimators are designed for resilience against model misspecification and achieve fast convergence rates, facilitating the construction of valid confidence intervals and enhancing the practical applicability of causal estimation.

Furthermore, to address limitations in coverage and scalability of many prior approaches, this work introduces the Unified Covariate Adjustment (UCA) framework. The UCA framework significantly expands the range of estimable causal quantities to include important effects such as the effect of treatment on the treated (ETT), mediation effects, and transportability, while also providing a scalable estimation solution for complex sum-product estimands (like front-door) that were previously computationally challenging in high-dimensional settings.

Ultimately, by developing systematic methodologies for deriving and estimating a wide array of causal estimands, this dissertation aims to make sophisticated causal inference techniques more accessible, reliable, and practically applicable, thereby narrowing the divide between formal causal theory and its real-world implementation.

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Elias Bareinboim

Advisor/Supervisor/Committee co-chair

Jeniffer Neville

Additional Committee Member 2

Yexiang Xue

Additional Committee Member 3

Jin Tian

Additional Committee Member 4

Iván Díaz