A Double Machine Learning Approach to Combining Experimental and Observational Data
Harsh Parikh, Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky
TL;DR
The paper tackles the challenge of estimating population treatment effects using both experimental and observational data when external validity or ignorability may be violated. It introduces a double machine learning framework with cross-fitting, efficient influence functions, and a falsification test based on the statistic $ heta(t)$ to detect violations of A3 or A4. A key theoretical contribution is the impossibility of a truly doubly resilient estimator under unknown violations, motivating the proposed two-stage estimators for $ heta(t)$ and $ u(t)$ that are root-$n$ consistent and provide valid confidence intervals. The authors validate their approach with synthetic data and three real-world applications (STAR, CASS, Lalonde NSW/PSID), demonstrating improved data fusion performance, interpretable tests for validity, and practical guidance for empirical causal inference. Overall, the work offers a principled, scalable toolkit for leveraging mixed data sources while diagnosing and adjusting for potential violations in causal identification.
Abstract
Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework proposes a falsification test for external validity and ignorability under milder assumptions. We provide consistent treatment effect estimators even when one of the assumptions is violated. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. Through comparative analyses, we show our framework's superiority over existing data fusion methods. The practical utility of our approach is further exemplified by three real-world case studies, underscoring its potential for widespread application in empirical research.
