Table of Contents
Fetching ...

Honesty in Causal Forests: When It Helps and When It Hurts

Yanfang Hou, Carlos Fernández-Loría

Abstract

Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard modeling practice with this method is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? In this paper, we show that honest estimation can reduce the accuracy of individual-level treatment effect estimates, especially when there are substantial differences in how individuals respond to treatment, and the data is rich enough to uncover those differences. The core issue is a classic bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting, because it limits the data available to detect and model heterogeneity. Across 7,500 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 25% more data to match the performance of models trained without it. We argue that honesty is best understood as a form of regularization and its use should be guided by application goals and empirical evaluation, not adopted reflexively.

Honesty in Causal Forests: When It Helps and When It Hurts

Abstract

Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard modeling practice with this method is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? In this paper, we show that honest estimation can reduce the accuracy of individual-level treatment effect estimates, especially when there are substantial differences in how individuals respond to treatment, and the data is rich enough to uncover those differences. The core issue is a classic bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting, because it limits the data available to detect and model heterogeneity. Across 7,500 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 25% more data to match the performance of models trained without it. We argue that honesty is best understood as a form of regularization and its use should be guided by application goals and empirical evaluation, not adopted reflexively.

Paper Structure

This paper contains 34 sections, 122 equations, 8 figures.

Figures (8)

  • Figure 1: Bias comparison of HE and AE. HE is unbiased conditional on a correct split, while AE exhibits selection bias. However, AE selects the informative feature more often, reducing approximation bias and producing an estimate closer to the CATE in expectation.
  • Figure 2: Target coupling as a function of the SNR. HE decreases the probability of an informative split compared to AE. In low-SNR settings (left), this moves trees toward stable, uninformative targets and reduces target coupling. In high-SNR settings (right), the same shift moves trees away from reliable and informative splits, increasing sampling sensitivity and target coupling.
  • Figure 3: Proportion of signal ($S^2$) captured by each selection strategy across SNR deciles. AE outperforms HE in most deciles, with the gap widening as SNR increases. The CV-based strategy closely tracks AE and approaches Oracle performance in high-SNR settings.
  • Figure 4: Proportion of datasets where each strategy significantly outperforms the other (5% level), by SNR decile. AE wins more in every decile. Neither method is universally better.
  • Figure 5: Additional data required for HE to match AE performance (5% significance level), by SNR decile. On average, honest estimation requires 1.6% to 25% more data to match the performance of causal forests trained without it.
  • ...and 3 more figures