Distilling interpretable causal trees from causal forests
Patrick Rehill
TL;DR
This work tackles the challenge of extracting interpretable insights from high-dimensional CATE distributions produced by causal forests. It introduces the Distilled Causal Tree (DCT), which uses knowledge distillation to learn a single, interpretable tree from a powerful teacher (the causal forest), yielding leaves with doubly robust, asymptotically normal estimates. By pairing KD with an optimal (or evolutionary) tree fitting approach, the method often outperforms other single-tree extractions and, in noisy, high-dimensional settings, can surpass the teacher itself. The approach is demonstrated through simulations on ACIC 2016 data and a real-world canvassing experiment, highlighting its potential to provide actionable, interpretable causal insights for policy and targeting decisions.
Abstract
Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.
