Table of Contents
Fetching ...

Modeling Causal Mechanisms with Diffusion Models for Interventional and Counterfactual Queries

Patrick Chao, Patrick Blöbaum, Sapan Patel, Shiva Prasad Kasiviswanathan

Abstract

We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach.

Modeling Causal Mechanisms with Diffusion Models for Interventional and Counterfactual Queries

Abstract

We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach.
Paper Structure (22 sections, 12 theorems, 52 equations, 8 figures, 8 tables, 7 algorithms)

This paper contains 22 sections, 12 theorems, 52 equations, 8 figures, 8 tables, 7 algorithms.

Key Result

Theorem 1

Assume for $X \in \mathcal{X}\subset \mathbb{R}$ and exogenous noise $U\sim \mathrm{Unif}[0,1]$, $X$ satisfies the structural equation: $X:=f(X_{\mathrm{pa}},U)$, where $X_\mathrm{pa} \in \mathcal{X}_\mathrm{pa} \subset \mathbb{R}^d$ are the parents of node $X$ and $U\perp \!\!\! \perp X_\mathrm{pa} Then, $g(X,X_\mathrm{pa}) = \tilde{q} (U)$ for an invertible function $\tilde{q}$.

Figures (8)

  • Figure 1: Left: Nonlinear setting (NLIN), Right: Nonadditive setting (NADD). Box plots of observational, interventional, and counterfactual queries of the ladder and random SCMs over $20$ random initializations of the model and training data.
  • Figure 2: Top: Nonlinear setting (NLIN), Bottom: Nonadditive setting (NADD). Box plots of observational, interventional, and counterfactual queries of four different SCMs over $10$ random initializations of the model and training data.
  • Figure 3: Ladder graph used in Section \ref{['sec: exp']}.
  • Figure 4: Example of a random graph used in \ref{['sec: exp']}, with exogenous noise nodes omitted for clarity.
  • Figure 5: Chain graph.
  • ...and 3 more figures

Theorems & Definitions (18)

  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Corollary 2
  • proof
  • ...and 8 more