Table of Contents
Fetching ...

Deep Backtracking Counterfactuals for Causally Compliant Explanations

Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach

TL;DR

This work tackles the challenge of generating counterfactuals that respect the true causal mechanisms in high-dimensional, deep generative models. It introduces DeepBC, a framework with two variants: stochastic DeepBC, which samples counterfactuals via Langevin dynamics in a structured latent space, and mode DeepBC, which computes a single most likely counterfactual through constrained optimization and linearization techniques. By operating in the latent space of invertible deep structural causal models, DeepBC ensures causal compliance, supports complex multi-variable antecedents, and offers modularity for domain shifts. Empirical results on Morpho-MNIST and CelebA illustrate that DeepBC preserves causal relationships, produces plausible counterfactuals, and demonstrates robustness to model misspecification and alternative distance definitions. Overall, DeepBC provides a principled, extensible path toward faithful, high-dimensional counterfactual explanations grounded in causal structure.

Abstract

Counterfactuals answer questions of what would have been observed under altered circumstances and can therefore offer valuable insights. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative where all causal laws are kept intact. In the present work, we introduce a practical method called deep backtracking counterfactuals (DeepBC) for computing backtracking counterfactuals in structural causal models that consist of deep generative components. We propose two distinct versions of our method--one utilizing Langevin Monte Carlo sampling and the other employing constrained optimization--to generate counterfactuals for high-dimensional data. As a special case, our formulation reduces to methods in the field of counterfactual explanations. Compared to these, our approach represents a causally compliant, versatile and modular alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.

Deep Backtracking Counterfactuals for Causally Compliant Explanations

TL;DR

This work tackles the challenge of generating counterfactuals that respect the true causal mechanisms in high-dimensional, deep generative models. It introduces DeepBC, a framework with two variants: stochastic DeepBC, which samples counterfactuals via Langevin dynamics in a structured latent space, and mode DeepBC, which computes a single most likely counterfactual through constrained optimization and linearization techniques. By operating in the latent space of invertible deep structural causal models, DeepBC ensures causal compliance, supports complex multi-variable antecedents, and offers modularity for domain shifts. Empirical results on Morpho-MNIST and CelebA illustrate that DeepBC preserves causal relationships, produces plausible counterfactuals, and demonstrates robustness to model misspecification and alternative distance definitions. Overall, DeepBC provides a principled, extensible path toward faithful, high-dimensional counterfactual explanations grounded in causal structure.

Abstract

Counterfactuals answer questions of what would have been observed under altered circumstances and can therefore offer valuable insights. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative where all causal laws are kept intact. In the present work, we introduce a practical method called deep backtracking counterfactuals (DeepBC) for computing backtracking counterfactuals in structural causal models that consist of deep generative components. We propose two distinct versions of our method--one utilizing Langevin Monte Carlo sampling and the other employing constrained optimization--to generate counterfactuals for high-dimensional data. As a special case, our formulation reduces to methods in the field of counterfactual explanations. Compared to these, our approach represents a causally compliant, versatile and modular alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.
Paper Structure (62 sections, 42 equations, 13 figures, 1 table, 2 algorithms)

This paper contains 62 sections, 42 equations, 13 figures, 1 table, 2 algorithms.

Figures (13)

  • Figure 1: Visualization of DeepBC for Morpho-MNIST. We generate a counterfactual (green) image $\text{img}^*$ and thickness $t^*$ with antecedent intensity $i^*$ for the factual, observable realizations (filled blue) $\text{img}$, $t$, $i$. Our approach finds new latent variables $\mathbf{u}^*$ that are close with respect to distances $d_i$ to the factual latents $\mathbf{u}$, subject to rendering the antecedent $i^*$ true. The causal mechanisms in the factual world remain unaltered in the counterfactual world. In this specific distribution, thickness and intensity are positively related, thus rendering the image both more intense and thicker in the counterfactual. Dependence of $f_i$ on graphical parents is omitted for simplifying visual appearance.
  • Figure 2: Difference between interventional and backtracking counterfactuals on a concrete example. Variables that are conditioned on correspond to filled circles. Interventional counterfactuals perform a hard intervention (indicated by a hammer) $X^*_2 \gets x^*_2$ with antecedent $x^*_2$ (i.e., $S=\{2\}$) in the counterfactual world (green). Backtracking counterfactuals, on the contrary, construct this counterfactual world via introducing a new set of latent variables $\mathbf{U}^*$ that depend on $\mathbf{U}$ via a backtracking conditional (red).
  • Figure 3: Backtracking interpretation of counterfactual explanations on a concrete example. DeepBC aims at modeling the true structural relationships between variables, exemplified by the causal graph in (a). Counterfactual explanations in the sense of wachter2017counterfactual have a backtracking interpretation in that they instead use a predictive model $f_{\hat{Y}}$ such as a classifier or regressor as structural equation \ref{['eq:two_variables']}, leading to the causal graph shown in (b). In general, the true variable $Y$, unlike the prediction $\hat{Y}$, may not be the effect of the covariates $X$ and $Z$ ($X$ and $Z$ may in addition be causally interrelated, as shown in (a)). Consequently, the counterfactuals made by counterfactual explanation methods must be interpreted differently in comparison to those made by our approach. Specifically, DeepBC intents to explain the true underlying variables rather than being confined to the prediction of a model, as can be read off of the counterfactual queries in the figure. For clarity, the latent variables $\mathbf{U}$ are omitted.
  • Figure 4: Counterfactual Scalar Variables on MorphoMNIST. The blue shaded areas indicate the probability density of the data. (a)(i) Given a factual realization (red dot), varying the values of the antecedent $i^*$ changes both $u^*_I$ and the upstream variable $u^*_T$. Since interventional counterfactuals do not perturb the latents, only the backtracking solution (grey dots) is shown. (ii) Interventional counterfactuals (green dots), in contrast to backtracking counterfactuals, leave $t^*$ unchanged when the effect variable intensity is taken as antecendent. (iii) When treating thickness as the antecedent, counterfactual and backtracking counterfactuals yield identical solutions. (b) For the correct graph, DeepBC counterfactuals for antecedent thickness do not change as the backtracking conditional (corresponding distance function shown under each subplot) is changed. When we performing DeepBC with the wrong graph ($I \rightarrow T$), causal compliance as described in \ref{['sec:int_and_back_counterfactuals']} is violated.
  • Figure 5: Counterfactual Morpho-MNIST Images: Backtracking vs. Interventional. DeepBC (top row) changes intensity alongside thickness, since their causal relation is preserved. Interventional counterfactuals (bottom row), on the contrary, solely change the intensity value, resulting in images that violate the causal laws and can be considered out-of-distribution w.r.t. the original data set.
  • ...and 8 more figures