Distilling interpretable causal trees from causal forests

Patrick Rehill

Distilling interpretable causal trees from causal forests

Patrick Rehill

TL;DR

This work tackles the challenge of extracting interpretable insights from high-dimensional CATE distributions produced by causal forests. It introduces the Distilled Causal Tree (DCT), which uses knowledge distillation to learn a single, interpretable tree from a powerful teacher (the causal forest), yielding leaves with doubly robust, asymptotically normal estimates. By pairing KD with an optimal (or evolutionary) tree fitting approach, the method often outperforms other single-tree extractions and, in noisy, high-dimensional settings, can surpass the teacher itself. The approach is demonstrated through simulations on ACIC 2016 data and a real-world canvassing experiment, highlighting its potential to provide actionable, interpretable causal insights for policy and targeting decisions.

Abstract

Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.

Distilling interpretable causal trees from causal forests

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 5 figures, 4 tables)

This paper contains 17 sections, 5 equations, 5 figures, 4 tables.

Introduction
Problem set-up
Potential outcomes and causal inference
Knowledge distillation
A knowledge distillation perspective on causal inference
Applying KD to the causal forest
How KD may improve performance where there are many noisy and correlated features
The Distilled Causal Tree
Further enhancing KD with an optimal tree algorithm
Greedy and optimal trees
How evolutionary trees work
Limitations of evolutionary trees
Simulation study
Set-up
ACIC simulations
...and 2 more sections

Figures (5)

Figure 1: Ground truth mean absolute error results on original ACIC data. The grey histogram is predictions of the (pruned) individual trees in the ensemble while lines are the specific models for comparison.
Figure 2: Ground truth mean absolute error results with noise and correlated features introduced into $X$. The grey histogram is predictions of the (pruned) individual trees in the ensemble while lines are the specific models for comparison.
Figure 3: Simulation results on original ACIC data with R-Loss. The grey histogram is predictions of the (pruned) individual trees in the ensemble while lines are the specific models for comparison.
Figure 4: Simulation results with noise and correlated features introduced into $X$ with R-Loss. The grey histogram is predictions of the (pruned) individual trees in the ensemble while lines are the specific models for comparison.
Figure 5: The DCT for the effect of a cash transfer on maths scores.

Distilling interpretable causal trees from causal forests

TL;DR

Abstract

Distilling interpretable causal trees from causal forests

Authors

TL;DR

Abstract

Table of Contents

Figures (5)