Table of Contents
Fetching ...

Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning

Anish Dhir, Cristiana Diaconu, Valentinian Mihai Lungu, James Requeima, Richard E. Turner, Mark van der Wilk

TL;DR

This work tackles the problem of estimating interventional distributions when the causal graph is uncertain. It introduces MACE-TNP, an end-to-end Transformer Neural Process that directly maps observational data to Bayesian model-averaged interventional distributions, thereby bypassing expensive intermediate posteriors. Empirical results show convergence to the analytic posterior in identifiable two-node cases, correct handling of non-identifiability with interventional data, and superior performance over strong Bayesian baselines across increasingly complex and high-dimensional settings, including real data from Sachs. The approach demonstrates the potential of meta-learning for scalable causal inference under uncertainty, while highlighting trade-offs in compute and the importance of training-data coverage for generalization.

Abstract

In scientific domains -- from biology to the social sciences -- many questions boil down to \textit{What effect will we observe if we intervene on a particular variable?} If the causal relationships (e.g.~a causal graph) are known, it is possible to estimate the intervention distributions. In the absence of this domain knowledge, the causal structure must be discovered from the available observational data. However, observational data are often compatible with multiple causal graphs, making methods that commit to a single structure prone to overconfidence. A principled way to manage this structural uncertainty is via Bayesian inference, which averages over a posterior distribution on possible causal structures and functional mechanisms. Unfortunately, the number of causal structures grows super-exponentially with the number of nodes in the graph, making computations intractable. We propose to circumvent these challenges by using meta-learning to create an end-to-end model: the Model-Averaged Causal Estimation Transformer Neural Process (MACE-TNP). The model is trained to predict the Bayesian model-averaged interventional posterior distribution, and its end-to-end nature bypasses the need for expensive calculations. Empirically, we demonstrate that MACE-TNP outperforms strong Bayesian baselines. Our work establishes meta-learning as a flexible and scalable paradigm for approximating complex Bayesian causal inference, that can be scaled to increasingly challenging settings in the future.

Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning

TL;DR

This work tackles the problem of estimating interventional distributions when the causal graph is uncertain. It introduces MACE-TNP, an end-to-end Transformer Neural Process that directly maps observational data to Bayesian model-averaged interventional distributions, thereby bypassing expensive intermediate posteriors. Empirical results show convergence to the analytic posterior in identifiable two-node cases, correct handling of non-identifiability with interventional data, and superior performance over strong Bayesian baselines across increasingly complex and high-dimensional settings, including real data from Sachs. The approach demonstrates the potential of meta-learning for scalable causal inference under uncertainty, while highlighting trade-offs in compute and the importance of training-data coverage for generalization.

Abstract

In scientific domains -- from biology to the social sciences -- many questions boil down to \textit{What effect will we observe if we intervene on a particular variable?} If the causal relationships (e.g.~a causal graph) are known, it is possible to estimate the intervention distributions. In the absence of this domain knowledge, the causal structure must be discovered from the available observational data. However, observational data are often compatible with multiple causal graphs, making methods that commit to a single structure prone to overconfidence. A principled way to manage this structural uncertainty is via Bayesian inference, which averages over a posterior distribution on possible causal structures and functional mechanisms. Unfortunately, the number of causal structures grows super-exponentially with the number of nodes in the graph, making computations intractable. We propose to circumvent these challenges by using meta-learning to create an end-to-end model: the Model-Averaged Causal Estimation Transformer Neural Process (MACE-TNP). The model is trained to predict the Bayesian model-averaged interventional posterior distribution, and its end-to-end nature bypasses the need for expensive calculations. Empirically, we demonstrate that MACE-TNP outperforms strong Bayesian baselines. Our work establishes meta-learning as a flexible and scalable paradigm for approximating complex Bayesian causal inference, that can be scaled to increasingly challenging settings in the future.

Paper Structure

This paper contains 41 sections, 3 theorems, 68 equations, 7 figures, 5 tables.

Key Result

Theorem B.1

Let $\mathcal{D}_{\text{obs}} := \{X^{(1)}, X^{(2)}, \ldots ,X^{(n)}\}$ be i.i.d. observations generated by one of the simple models described in eq:model_1_identeq:model_3_ident. The posterior over the graphs $[p(\mathcal{G}_1|\mathcal{D}_{\text{obs}}), p(\mathcal{G}_2|\mathcal{D}_{\text{obs}}), p( where $c$ is a constant of normalisation and The posterior interventional distribution is a mixtur

Figures (7)

  • Figure 1: Overview of MACE-TNP. Unlike classical approaches, that usually require a two-step procedure which 1) first involves posterior inference over the graph structure, followed by 2) complicated inference over the functional mechanism, MACE-TNP amortises the full causal inference pipeline.
  • Figure 2: Overview of MACE-TNP yielding $p_{\theta}({\mathbf{x}}_i|{\operatorname{do}}({\mathbf{x}}_j), \mathcal{D}_{\text{obs}})$. Inputs are 1) embedded via variable-specific MLPs, 2) fed into a transformer encoder that alternates sample-wise and node-wise attention. The resulting outcome node representation from the unknown interventional distribution is 3) decoded to obtain the parameters of the NP distribution.
  • Figure 3: KL divergences as a function of the observational sample size, for the identifiable case (left) and the non-identifiable one (right). Dark blue denotes $p_{\text{BCM}}$---the posterior interventional distribution defined in \ref{['eq:bayes_pred_intvn']}, red and green use $p^{\ast}_{BCM}$---the interventional distribution conditioned on $\{{\mathbf{f}}^{\ast}, {\mathcal{G}}^{\ast}\}$. We additionally provide MACE-TNP with $M_{\text{int}}=5$ interventional samples. We indicate the median and the 10-90% quantiles.
  • Figure 4: KL divergence (right) of the interventional distribution conditioned on $\{{\mathbf{f}}^{\ast}, {\mathcal{G}}^{\ast}\}$ and the model's for the confounder (dark blue), mediator (red) and unobserved confounder (light blue) cases. With increasing sample size, MACE-TNP identifies the correct distribution for both the mediator and confounder cases, implicitly carrying out the required adjustment.
  • Figure 5: Overview of the data generation process. We first sample a graph ${\mathcal{G}}$, and a functional mechanism (conditioned on the sampled graph) for each of the $D$ nodes in the dataset. These are then used to draw $N_{\text{obs}}$ observational samples. To construct the interventional dataset, we first randomly sample a node to intervene upon $j$, draw $N_{\text{int}}$ intervention values $\mathbf{x}_j \sim \mathcal{N}(\bf0, \mathbf{I})$, and set the values of node $j$ to be $\mathbf{x}_j$. We then drawn $N_{\text{int}}$ samples of each node to form an interventional dataset $\mathcal{D}_{\text{int}}$.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 2.1
  • Theorem B.1
  • Lemma 1
  • Theorem B.2