Deep Backtracking Counterfactuals for Causally Compliant Explanations
Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach
TL;DR
This work tackles the challenge of generating counterfactuals that respect the true causal mechanisms in high-dimensional, deep generative models. It introduces DeepBC, a framework with two variants: stochastic DeepBC, which samples counterfactuals via Langevin dynamics in a structured latent space, and mode DeepBC, which computes a single most likely counterfactual through constrained optimization and linearization techniques. By operating in the latent space of invertible deep structural causal models, DeepBC ensures causal compliance, supports complex multi-variable antecedents, and offers modularity for domain shifts. Empirical results on Morpho-MNIST and CelebA illustrate that DeepBC preserves causal relationships, produces plausible counterfactuals, and demonstrates robustness to model misspecification and alternative distance definitions. Overall, DeepBC provides a principled, extensible path toward faithful, high-dimensional counterfactual explanations grounded in causal structure.
Abstract
Counterfactuals answer questions of what would have been observed under altered circumstances and can therefore offer valuable insights. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative where all causal laws are kept intact. In the present work, we introduce a practical method called deep backtracking counterfactuals (DeepBC) for computing backtracking counterfactuals in structural causal models that consist of deep generative components. We propose two distinct versions of our method--one utilizing Langevin Monte Carlo sampling and the other employing constrained optimization--to generate counterfactuals for high-dimensional data. As a special case, our formulation reduces to methods in the field of counterfactual explanations. Compared to these, our approach represents a causally compliant, versatile and modular alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.
