Causal Data Science: Estimating Identifiable Causal Effects
Causal inference is central to scientific inquiry and decision-making across numerous disciplines, including the social sciences, economics, biology, and medicine. Drawing causal conclusions from data fundamentally involves two primary tasks: first, causal effect identification, which is determining if a causal effect can be computed from available data and assumptions about the data-generating process (often encoded in a causal graph); and second, causal effect estimation, which involves quantifying this effect from finite samples. While the theory of causal effect identification is now well-established, with comprehensive graphical and algorithmic solutions for determining when interventional distributions can be uniquely recovered, a significant challenge persists in terms of the practical estimation of these effects. This is particularly true when the identified causal estimand is complex, moving beyond standard scenarios like simple covariate adjustments, or when data are drawn from heterogeneous sources. Consequently, a gap remains between our theoretical understanding of what causal effects are identifiable and our ability to reliably and efficiently quantify them in practice.
This dissertation addresses this critical gap by providing a unified and systematic approach to the estimation of identifiable causal effects. It offers a comprehensive framework designed to empower computer scientists, statisticians, and data scientists with robust and sample-efficient tools. Key to this framework is the ability to represent diverse identified estimands—whether arising from purely observational data or from a fusion of observational and experimental sources—as a composition of more fundamental adjustment-like quantities.
Leveraging this structured representation, the dissertation develops doubly robust and debiased machine learning estimators for a broad spectrum of interventional distributions. These estimators are designed for resilience against model misspecification and achieve fast convergence rates, facilitating the construction of valid confidence intervals and enhancing the practical applicability of causal estimation.
Furthermore, to address limitations in coverage and scalability of many prior approaches, this work introduces the Unified Covariate Adjustment (UCA) framework. The UCA framework significantly expands the range of estimable causal quantities to include important effects such as the effect of treatment on the treated (ETT), mediation effects, and transportability, while also providing a scalable estimation solution for complex sum-product estimands (like front-door) that were previously computationally challenging in high-dimensional settings.
Ultimately, by developing systematic methodologies for deriving and estimating a wide array of causal estimands, this dissertation aims to make sophisticated causal inference techniques more accessible, reliable, and practically applicable, thereby narrowing the divide between formal causal theory and its real-world implementation.
History
Degree Type
- Doctor of Philosophy
Department
- Computer Science
Campus location
- West Lafayette