Table of Contents
Fetching ...

Estimating Causal Effects with Double Machine Learning -- A Method Evaluation

Jonathan Fuhr, Philipp Berens, Dominik Papies

TL;DR

The paper evaluates double/debiased machine learning (DML) for causal effect estimation from observational data, highlighting how cross-fitting and orthogonalization enable flexible ML models to adjust for nonlinear confounding under identification assumptions. It conducts extensive simulations across functional forms, confounder counts, sample sizes, and noisy variables, showing that flexible ML methods (notably XGBoost, GAMs, random forests, and neural networks) often outperform linear approaches and that predictive accuracy in the first stage signals reliability of causal estimates. In a real-world hedonic housing-price application, DML demonstrates stability when varying repeats and suggests that more flexible specifications can yield stronger estimated effects than traditional specifications, though results depend on the assumed causal structure and identification validity. The study offers actionable guidance on when and how to apply DML, emphasizing algorithm choice, cross-fitting, repeat runs, and covariate categorization, while acknowledging limitations and areas for future research, including panel data extensions and IV settings.

Abstract

The estimation of causal effects with observational data continues to be a very active research area. In recent years, researchers have developed new frameworks which use machine learning to relax classical assumptions necessary for the estimation of causal effects. In this paper, we review one of the most prominent methods - "double/debiased machine learning" (DML) - and empirically evaluate it by comparing its performance on simulated data relative to more traditional statistical methods, before applying it to real-world data. Our findings indicate that the application of a suitably flexible machine learning algorithm within DML improves the adjustment for various nonlinear confounding relationships. This advantage enables a departure from traditional functional form assumptions typically necessary in causal effect estimation. However, we demonstrate that the method continues to critically depend on standard assumptions about causal structure and identification. When estimating the effects of air pollution on housing prices in our application, we find that DML estimates are consistently larger than estimates of less flexible methods. From our overall results, we provide actionable recommendations for specific choices researchers must make when applying DML in practice.

Estimating Causal Effects with Double Machine Learning -- A Method Evaluation

TL;DR

The paper evaluates double/debiased machine learning (DML) for causal effect estimation from observational data, highlighting how cross-fitting and orthogonalization enable flexible ML models to adjust for nonlinear confounding under identification assumptions. It conducts extensive simulations across functional forms, confounder counts, sample sizes, and noisy variables, showing that flexible ML methods (notably XGBoost, GAMs, random forests, and neural networks) often outperform linear approaches and that predictive accuracy in the first stage signals reliability of causal estimates. In a real-world hedonic housing-price application, DML demonstrates stability when varying repeats and suggests that more flexible specifications can yield stronger estimated effects than traditional specifications, though results depend on the assumed causal structure and identification validity. The study offers actionable guidance on when and how to apply DML, emphasizing algorithm choice, cross-fitting, repeat runs, and covariate categorization, while acknowledging limitations and areas for future research, including panel data extensions and IV settings.

Abstract

The estimation of causal effects with observational data continues to be a very active research area. In recent years, researchers have developed new frameworks which use machine learning to relax classical assumptions necessary for the estimation of causal effects. In this paper, we review one of the most prominent methods - "double/debiased machine learning" (DML) - and empirically evaluate it by comparing its performance on simulated data relative to more traditional statistical methods, before applying it to real-world data. Our findings indicate that the application of a suitably flexible machine learning algorithm within DML improves the adjustment for various nonlinear confounding relationships. This advantage enables a departure from traditional functional form assumptions typically necessary in causal effect estimation. However, we demonstrate that the method continues to critically depend on standard assumptions about causal structure and identification. When estimating the effects of air pollution on housing prices in our application, we find that DML estimates are consistently larger than estimates of less flexible methods. From our overall results, we provide actionable recommendations for specific choices researchers must make when applying DML in practice.
Paper Structure (30 sections, 7 equations, 21 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 7 equations, 21 figures, 8 tables, 1 algorithm.

Figures (21)

  • Figure 1: Overview of DML applications in the literature. A Discipline the application was published in. B Different ML algorithms used within DML. C Treatment type considered in application. D Dimensionality: ratio of the number of variables to the number of observations. E Number of folds the data is split into within DML. F Number of algorithm repetitions for increased robustness.
  • Figure 2: Directed acyclic graph (DAG) for the assumed causal structure. $W$: treatment variable, $Y$: outcome variable, $\boldsymbol{X_c}$: observed confounding variables. The relationships between $\boldsymbol{X_c}$ and $W$ ($m_0()$), and $\boldsymbol{X_c}$ and $Y$ ($g_0()$), are potentially complex and nonlinear.
  • Figure 3: Possible violations of unconfoundedness.
  • Figure 4: Results for our baseline simulation with sample size $n=1000$. The horizontal axis displays the different methods from Table \ref{['tab:methods']}. The vertical axis depicts the estimated coefficient. The dashed line marks the true causal effect ($\beta = 1$). The boxplots show the distribution of estimated coefficients across 100 simulated datasets for each method.
  • Figure 5: Results for Case 1 - distribution of estimated coefficients for each method across 100 simulations by functional form (outliers not displayed). The dashed line marks the true causal effect ($\beta = 1$). A Linear confounding. B U-shaped/squared confounding. C Pairwise interactions between confounders. D Confounding via step function. E Cubic confounding. F Confounding functional form drawn randomly for each confounder.
  • ...and 16 more figures