Table of Contents
Fetching ...

Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation

Lucile Ter-Minassian, Liran Szlak, Ehud Karavani, Chris Holmes, Yishai Shimoni

TL;DR

This work tackles the need for transparent causal effect estimation from observational data by introducing BICauseTree, an interpretable balancing method that uses a decision-tree framework to discover local natural experiments and identify positivity violations. The method optimizes splits to maximize covariate imbalance reduction via Absolute Standardized Mean Difference, incorporates a principled pruning and positivity-filtering procedure, and allows leaf-wise estimation with flexible outcome/propensity models, yielding a covariate-based inferentiable target population. Across synthetic and real benchmark datasets, BICauseTree achieves competitive bias with interpretable partitions, effectively abstaining in regions with poor overlap and enabling policy-relevant, covariate-defined inferences. The approach combines transparency with practical performance and provides open-source code, facilitating adoption in high-stakes domains where trust and interpretability are crucial.

Abstract

Interpretability and transparency are essential for incorporating causal effect models from observational data into policy decision-making. They can provide trust for the model in the absence of ground truth labels to evaluate the accuracy of such models. To date, attempts at transparent causal effect estimation consist of applying post hoc explanation methods to black-box models, which are not interpretable. Here, we present BICauseTree: an interpretable balancing method that identifies clusters where natural experiments occur locally. Our approach builds on decision trees with a customized objective function to improve balancing and reduce treatment allocation bias. Consequently, it can additionally detect subgroups presenting positivity violations, exclude them, and provide a covariate-based definition of the target population we can infer from and generalize to. We evaluate the method's performance using synthetic and realistic datasets, explore its bias-interpretability tradeoff, and show that it is comparable with existing approaches.

Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation

TL;DR

This work tackles the need for transparent causal effect estimation from observational data by introducing BICauseTree, an interpretable balancing method that uses a decision-tree framework to discover local natural experiments and identify positivity violations. The method optimizes splits to maximize covariate imbalance reduction via Absolute Standardized Mean Difference, incorporates a principled pruning and positivity-filtering procedure, and allows leaf-wise estimation with flexible outcome/propensity models, yielding a covariate-based inferentiable target population. Across synthetic and real benchmark datasets, BICauseTree achieves competitive bias with interpretable partitions, effectively abstaining in regions with poor overlap and enabling policy-relevant, covariate-defined inferences. The approach combines transparency with practical performance and provides open-source code, facilitating adoption in high-stakes domains where trust and interpretability are crucial.

Abstract

Interpretability and transparency are essential for incorporating causal effect models from observational data into policy decision-making. They can provide trust for the model in the absence of ground truth labels to evaluate the accuracy of such models. To date, attempts at transparent causal effect estimation consist of applying post hoc explanation methods to black-box models, which are not interpretable. Here, we present BICauseTree: an interpretable balancing method that identifies clusters where natural experiments occur locally. Our approach builds on decision trees with a customized objective function to improve balancing and reduce treatment allocation bias. Consequently, it can additionally detect subgroups presenting positivity violations, exclude them, and provide a covariate-based definition of the target population we can infer from and generalize to. We evaluate the method's performance using synthetic and realistic datasets, explore its bias-interpretability tradeoff, and show that it is comparable with existing approaches.
Paper Structure (77 sections, 13 equations, 29 figures, 1 table, 2 algorithms)

This paper contains 77 sections, 13 equations, 29 figures, 1 table, 2 algorithms.

Figures (29)

  • Figure 1: Estimation bias for (left) the natural experiment dataset and (right) the positivity violations dataset across 50 subsamples, with $N = 20,000$. In the natural experiment (left), on top of being transparent, BICauseTree has lower bias in causal effect estimation compared to all other methods, excluding IPW(GBT) which has comparable performance. In the positivity violation experiment (right), after filtering violating samples the effect estimation by BICauseTree remains unbiased and with low variance.
  • Figure 2: Estimation bias for (left) the twins dataset and (right) the ACIC dataset across 50 subsamples, with $N = 20,000$ excluding positivity-violating leaf nodes. For the twins dataset (left), the BICauseTree(Marginal) estimator is less biased than the marginal estimator. Augmenting our tree with an IPW outcome model (BICauseTree(IPW)) further decreases estimation bias, making it comparable with IPW wrt both bias and estimation variance. For the ACIC dataset (right), both BICauseTree models compare with IPW wrt estimation bias and variance.
  • Figure 3: Estimation bias for BICauseTree(Marginal) with varying maximum depth parameter, and average bias of IPW (dotted), on the twins training set. The estimation bias in leaf nodes decreases for increasing maximum depth values of our tree, and stays consistent for values beyond 5.
  • Figure 4: Calibration of propensity score, twins dataset. Though Logsitic regression, which has better data efficiency, has less-noisy calibration, BICauseTree still shows satisfying calibration on average.
  • Figure A1: Comparison of estimation bias (left, absolute difference) and imbalance (right, maximum absolute standardized mean difference [ASMD]) using two flavors of BICauseTree: First, the original version from the manuscript, wherein each recursion the feature to split on is chosen by the one with maximal ASMD. And second, a variant where features are selected at random, with all other hyperparameters held constant. We see that selecting the most imbalanced feature for stratification leads to better balancing and, more importantly, better estimation - justifying the rationale for selecting features based on the highest ASMD.
  • ...and 24 more figures