Table of Contents
Fetching ...

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager, Susan Athey

TL;DR

This paper develops causal forests, a non-parametric method extending Breiman's random forests to estimate heterogeneous treatment effects under unconfoundedness. It proves pointwise consistency and asymptotic normality for the conditional average treatment effect $\tau(x)$ and provides a practical, consistent variance estimator via the infinitesimal jackknife, applicable to a broad class of forest variants including honest and double-sample trees. Through simulations, causal forests outperform classical nearest-neighbor methods in mean-squared error and achieve nominal confidence interval coverage in moderate to high dimensions, highlighting their practical value for individualized treatment decisions. The work thus enables reliable, data-driven inference on treatment effect heterogeneity in high-dimensional settings, with implications for personalized medicine, policy evaluation, and targeted marketing.

Abstract

Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

TL;DR

This paper develops causal forests, a non-parametric method extending Breiman's random forests to estimate heterogeneous treatment effects under unconfoundedness. It proves pointwise consistency and asymptotic normality for the conditional average treatment effect and provides a practical, consistent variance estimator via the infinitesimal jackknife, applicable to a broad class of forest variants including honest and double-sample trees. Through simulations, causal forests outperform classical nearest-neighbor methods in mean-squared error and achieve nominal confidence interval coverage in moderate to high dimensions, highlighting their practical value for individualized treatment decisions. The work thus enables reliable, data-driven inference on treatment effect heterogeneity in high-dimensional settings, with implications for personalized medicine, policy evaluation, and targeted marketing.

Abstract

Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

Paper Structure

This paper contains 26 sections, 13 theorems, 137 equations, 4 figures, 5 tables.

Key Result

Theorem 1

Suppose that we have $n$ independent and identically distributed training examples $Z_i = \left(X_i, \, Y_i\right) \in [0, \, 1]^d \times \mathbb{R}$. Suppose moreover that the features are independently and uniformly distributed The result also holds with a density that is bounded away from 0 and i Then, random forest predictions are asymptotically Gaussian: Moreover, the asymptotic variance $\s

Figures (4)

  • Figure 1: Graphical diagnostics for causal forests in the setting of \ref{['eq:prop_setup']}. The first two panels evaluate the sampling error of causal forests and our infinitesimal jackknife estimate of variance over 1,000 randomly draw test points, with $d = 20$. The right-most panel shows standardized Gaussian QQ-plots for predictions at the same 1000 test points, with $n = 800$ and $d = 20$. The first two panels are computed over 50 randomly drawn training sets, and the last one over 20 training sets.
  • Figure 2: The true treatment effect $\tau(X_i)$ at 10,000 random test examples $X_i$, along with estimates $\hat{\tau}(X_i)$ produced by a causal forest and optimally-tuned $k$-NN, on data drawn according to \ref{['eq:tau_setup']} with $d = 6, \, 20$. The test points are plotted according to their first two coordinates; the treatment effect is denoted by color, from dark (low) to light (high). On this simulation instance, causal forests and $k^*$-NN had a mean-squared error of 0.03 and 0.13 respectively for $d = 6$, and of 0.05 and 0.62 respectively for $d = 20$. The optimal tuning choices for $k$-NN were $k^* = 39$ for $d = 6$, and $k^* = 24$ for $d = 20$.
  • Figure 3: Comparison of the performance of honest and adaptive causal forests when predicting at $x_0 = (0, \, 0, \, \ldots, \, 0)$, which is a corner of the support of the features $X_i$. Both forests have $B = 500$ trees, and use a leaf-size of $k = 1$. We use a subsample size $s = n^{0.8}$ for adaptive forests and $s = 2 \, n^{0.8}$ for honest forests. All results are averaged over 40 replications; we report both bias and root-mean-squared error (RMSE).
  • Figure 4: Comparison of the root-mean-squared error of honest and adaptive forests in the setting of Table \ref{['tab:tau0_simu']}, with $d = 8$. Honest forests use $s = 2500$ (i.e., $\left\lvert\mathcal{I}\right\rvert = 1250$) while adaptive forests use $s = 1250$, such that both methods grow trees of the same depth. Both forests have $B = 500$, and results are averaged over 100 simulation replications.

Theorems & Definitions (37)

  • Remark 1
  • Remark 2
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Theorem 1
  • Remark 3: binary classification
  • Lemma 2
  • ...and 27 more