Estimating Causal Effects with Double Machine Learning -- A Method Evaluation
Jonathan Fuhr, Philipp Berens, Dominik Papies
TL;DR
The paper evaluates double/debiased machine learning (DML) for causal effect estimation from observational data, highlighting how cross-fitting and orthogonalization enable flexible ML models to adjust for nonlinear confounding under identification assumptions. It conducts extensive simulations across functional forms, confounder counts, sample sizes, and noisy variables, showing that flexible ML methods (notably XGBoost, GAMs, random forests, and neural networks) often outperform linear approaches and that predictive accuracy in the first stage signals reliability of causal estimates. In a real-world hedonic housing-price application, DML demonstrates stability when varying repeats and suggests that more flexible specifications can yield stronger estimated effects than traditional specifications, though results depend on the assumed causal structure and identification validity. The study offers actionable guidance on when and how to apply DML, emphasizing algorithm choice, cross-fitting, repeat runs, and covariate categorization, while acknowledging limitations and areas for future research, including panel data extensions and IV settings.
Abstract
The estimation of causal effects with observational data continues to be a very active research area. In recent years, researchers have developed new frameworks which use machine learning to relax classical assumptions necessary for the estimation of causal effects. In this paper, we review one of the most prominent methods - "double/debiased machine learning" (DML) - and empirically evaluate it by comparing its performance on simulated data relative to more traditional statistical methods, before applying it to real-world data. Our findings indicate that the application of a suitably flexible machine learning algorithm within DML improves the adjustment for various nonlinear confounding relationships. This advantage enables a departure from traditional functional form assumptions typically necessary in causal effect estimation. However, we demonstrate that the method continues to critically depend on standard assumptions about causal structure and identification. When estimating the effects of air pollution on housing prices in our application, we find that DML estimates are consistently larger than estimates of less flexible methods. From our overall results, we provide actionable recommendations for specific choices researchers must make when applying DML in practice.
