Table of Contents
Fetching ...

Multiply-Robust Causal Change Attribution

Victor Quintas-Martinez, Mohammad Taha Bahadori, Eduardo Santiago, Jeff Mu, Dominik Janzing, David Heckerman

TL;DR

This paper tackles causal change attribution: disentangling how multiple causal mechanisms contribute to differences in an outcome distribution across two samples. It introduces a multiply robust estimation framework that combines regression and re-weighting to identify counterfactual distributions under a fixed causal DAG, proving consistency and asymptotic normality under weak ML-learning conditions. The authors show how to estimate per-mechanism contributions via Shapley values or along causal paths, with theoretical guarantees that the attribution measures inherit the estimator’s large-sample properties. Empirically, the method demonstrates strong robustness in Monte Carlo simulations and yields interpretable insights in a gender wage-gap study, with practical implementation in the Python library DoWhy, enabling reliable causal attribution in applied settings.

Abstract

Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology is multiply robust, meaning that it still recovers the target parameter under partial misspecification. We prove that our estimator is consistent and asymptotically normal. Moreover, it can be incorporated into existing frameworks for causal attribution, such as Shapley values, which will inherit the consistency and large-sample distribution properties. Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application. Our method is implemented as part of the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).

Multiply-Robust Causal Change Attribution

TL;DR

This paper tackles causal change attribution: disentangling how multiple causal mechanisms contribute to differences in an outcome distribution across two samples. It introduces a multiply robust estimation framework that combines regression and re-weighting to identify counterfactual distributions under a fixed causal DAG, proving consistency and asymptotic normality under weak ML-learning conditions. The authors show how to estimate per-mechanism contributions via Shapley values or along causal paths, with theoretical guarantees that the attribution measures inherit the estimator’s large-sample properties. Empirically, the method demonstrates strong robustness in Monte Carlo simulations and yields interpretable insights in a gender wage-gap study, with practical implementation in the Python library DoWhy, enabling reliable causal attribution in applied settings.

Abstract

Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology is multiply robust, meaning that it still recovers the target parameter under partial misspecification. We prove that our estimator is consistent and asymptotically normal. Moreover, it can be incorporated into existing frameworks for causal attribution, such as Shapley values, which will inherit the consistency and large-sample distribution properties. Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application. Our method is implemented as part of the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).
Paper Structure (37 sections, 7 theorems, 60 equations, 7 figures, 2 tables)

This paper contains 37 sections, 7 theorems, 60 equations, 7 figures, 2 tables.

Key Result

Lemma 2.1

Under the regularity conditions given in the appendix, we have the following identification results: where $\gamma(X) := \rm{E}_{(0)}[Y \mid X]$, $\alpha(X) := {\rm{d}P_{X}^{(1)}}/{\rm{d}P_{X}^{(0)}}(X)$, and $\rm{E}_{(t)}[\cdot]$ denotes the expectation conditional on $T = t$.

Figures (7)

  • Figure 1: Two possible DAGs for $(T, \bm{X}, Y)$.
  • Figure 2: Visual intuition for regression and re-weighting.
  • Figure 3: DAG for \ref{['ex:mediat']}.
  • Figure 4: DAG for the gender wage gap application.
  • Figure 5: Shapley Values in our gender wage gap application. We plot the point estimates and 95% confidence intervals. We also report the point estimates (bootstrapped standard errors in brackets). We denote statistical significance at the 5% level by $^{**}$, and at the 1% level by $^{***}$.
  • ...and 2 more figures

Theorems & Definitions (22)

  • Lemma 2.1
  • Remark 2.2: Relation to Mediation
  • Lemma 2.3
  • Remark 2.4: On the Overlap Assumption
  • Theorem 2.5
  • Example 2.6
  • Remark 2.7: Identification with Covariates
  • Remark 2.8: Runtime and Speedup
  • Remark 2.9: Sample-splitting
  • Theorem 2.10
  • ...and 12 more