Exactly Computing do-Shapley Values
R. Teal Witter, Álvaro Parafita, Tomas Garriga, Maximilian Muschalik, Fabian Fumagalli, Axel Brando, Lucas Rosenblatt
TL;DR
This work tackles the computational bottleneck of do-Shapley value estimation in Structural Causal Models by reframing the problem in terms of irreducible equivalence classes of coalitions defined by basis and closure. It delivers an exact algorithm with time complexity linear in the number of irreducible sets r, which often scales between d and far below 2^d in real graphs, and introduces a structure-aware boundary sampling scheme that achieves near-optimal accuracy under fixed query budgets while converging to exact values when m ≥ r. A linear-time identifiability check reduces practical risk by ensuring identifiability of all class queries from singleton identifiability. Empirical results on the TALENT benchmark demonstrate substantial speedups and accuracy gains over structure-agnostic baselines, highlighting the practical impact for scalable causal explainability in complex systems. The framework also generalizes to other value notions and interaction indices, broadening its applicability to diverse causal attribution tasks.
Abstract
Structural Causal Models (SCM) are a powerful framework for describing complicated dynamics across the natural sciences. A particularly elegant way of interpreting SCMs is do-Shapley, a game-theoretic method of quantifying the average effect of $d$ variables across exponentially many interventions. Like Shapley values, computing do-Shapley values generally requires evaluating exponentially many terms. The foundation of our work is a reformulation of do-Shapley values in terms of the irreducible sets of the underlying SCM. Leveraging this insight, we can exactly compute do-Shapley values in time linear in the number of irreducible sets $r$, which itself can range from $d$ to $2^d$ depending on the graph structure of the SCM. Since $r$ is unknown a priori, we complement the exact algorithm with an estimator that, like general Shapley value estimators, can be run with any query budget. As the query budget approaches $r$, our estimators can produce more accurate estimates than prior methods by several orders of magnitude, and, when the budget reaches $r$, return the Shapley values up to machine precision. Beyond computational speed, we also reduce the identification burden: we prove that non-parametric identifiability of do-Shapley values requires only the identification of interventional effects for the $d$ singleton coalitions, rather than all classes.
