Table of Contents
Fetching ...

Causal Modeling with Stationary Diffusions

Lars Lorch, Andreas Krause, Bernhard Schölkopf

TL;DR

This work introduces a graph-free, time-dynamic approach to causal modeling by treating variables as the stationary density $\mu$ of a diffusion $\,d\mathbf{x}_t = f(\mathbf{x}_t)\,dt + \sigma(\mathbf{x}_t)\,d\mathbb{W}_t$. Causality and interventions are captured through modifications to the drift and diffusion terms, with the Kernel Deviation from Stationarity (KDS) providing a differentiable, kernel-based objective to learn the SDEs from interventional data. The authors establish a representer-based characterization of the stationarity condition, prove consistency for Matérn kernels, and demonstrate gradient-based learning without sampling. Empirically, stationary diffusions learned via KDS outperform several causal baselines on synthetic cyclic systems and gene-regulatory networks, including generalization to unseen interventions. This framework enables robust causal reasoning in cyclic and dynamical settings without explicit causal graphs, offering scalable inference and potential for broader diffusion-based causal analysis.

Abstract

We develop a novel approach towards causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they generalize to unseen interventions on their variables, often better than classical approaches. Our inference method is based on a new theoretical result that expresses a stationarity condition on the diffusion's generator in a reproducing kernel Hilbert space. The resulting kernel deviation from stationarity (KDS) is an objective function of independent interest.

Causal Modeling with Stationary Diffusions

TL;DR

This work introduces a graph-free, time-dynamic approach to causal modeling by treating variables as the stationary density of a diffusion . Causality and interventions are captured through modifications to the drift and diffusion terms, with the Kernel Deviation from Stationarity (KDS) providing a differentiable, kernel-based objective to learn the SDEs from interventional data. The authors establish a representer-based characterization of the stationarity condition, prove consistency for Matérn kernels, and demonstrate gradient-based learning without sampling. Empirically, stationary diffusions learned via KDS outperform several causal baselines on synthetic cyclic systems and gene-regulatory networks, including generalization to unseen interventions. This framework enables robust causal reasoning in cyclic and dynamical settings without explicit causal graphs, offering scalable inference and potential for broader diffusion-based causal analysis.

Abstract

We develop a novel approach towards causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they generalize to unseen interventions on their variables, often better than classical approaches. Our inference method is based on a new theoretical result that expresses a stationarity condition on the diffusion's generator in a reproducing kernel Hilbert space. The resulting kernel deviation from stationarity (KDS) is an objective function of independent interest.
Paper Structure (65 sections, 6 theorems, 29 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 65 sections, 6 theorems, 29 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Let $\mu$ be a probability density over $\mathbb{R}^d$ and assume that the functions $f$, $\sigma$, and the partial$\,$Like steinwart2008support, we use $\partial/\partial x_{i,i}$ to denote the first-order partial derivative w.r.t. both function arguments, so $\partial/\partial x_{i,i} k(\mathbf{x} for any $h \in \mathcal{H}$. Moreover, $g_{\mu,\mathcal{L}}(\cdot) = \mathbb{E}_{\mathbf{x} \sim \m

Figures (6)

  • Figure 1: Stationary SDEs as causal models. The bottom axes show sample paths of a stationary diffusion in ${\mathbb{R}^2}$ before (pale) and after (dark) an intervention on the SDE governing $x_1$. The marginals $p(x_j)$ visualize the distribution shift in $p(x_1, x_2)$.
  • Figure 2: Components of the KDS for a stationary linear SDE and a Gaussian kernel ${k_\gamma}$ with $\gamma = 0.5$. Expectations over $\mu$ are approximated with $1000$ samples. 1: Densities of a target ($\mu$, black) and two alternative models. 2: KDS witness functions for the misspecified models. 3: Witnesses after applying $\mathcal{L}$, yielding their time derivatives in the diffusion. After multiplying by $\mu$, the KDS is equal to the integral of the shaded areas. 4-5: KDS derivatives with respect to $a$ and $c$, fixing the other parameters at those of the target model. The partial derivatives have zeroes at the true parameters of the model inducing $\mu$, thus gradient descent drives the incorrect $a$ and $c$ to their true values (indicated by vertical, dashed lines).
  • Figure 3: Inference intuition. Our goal is to infer mechanisms ${f_{\bm{\theta}}, \sigma_{\bm{\theta}}}$ that explain the observed densities $\nu_{1:M}$. To achieve this, we jointly learn $\bm{\theta}$ and interventions ${\bm{\phi}_i}$ that induce stationary densities ${\mu_{\bm{\phi}_i}}$ fitting $\nu_i$.
  • Figure 4: Benchmarking results (${d=20}$ variables, Erdős-Rényi causal structure). Metrics are computed from 10.0 test interventions on unseen target variables in 50.0 randomly-generated systems. Box plots show medians and interquartile ranges (IQR). Whiskers extend to the largest value inside 1.5 times the IQR length from the boxes. Overall, causal stationary diffusions learned via the KDS (Algorithm \ref{['alg:algo']}, bold-faced) are the most accurate at predicting the effects of interventions on unseen targets, measured in terms of both ${W_2}$ ($\downarrow$) and MSE ($\downarrow$).
  • Figure 5: Computing the generator gradient ${\nabla_{\bm{\theta}} \mathcal{L}^{\bm{\theta}}_{\mathbf{x}}\mathcal{L}^{\bm{\theta}}_{\mathbf{x}'}k(\mathbf{x},\mathbf{x}')}$ via two calls of the operator ${\mathcal{L}^{\bm{\theta}}}$ (here: L).
  • ...and 1 more figures

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Lemma 6