Table of Contents
Fetching ...

Effective Bayesian Causal Inference via Structural Marginalisation and Autoregressive Orders

Christian Toth, Christian Knoll, Franz Pernkopf, Robert Peharz

TL;DR

This work tackles epistemic uncertainty in causal inference by decomposing Bayesian causal marginalisation into two tractable steps: marginalisation over causal orders and marginalisation over DAGs given an order. It introduces ARCO, a gradient-friendly auto-regressive model over causal orders, and couples it with exact DAG marginalisation under a maximum parent set size using Gaussian-process mechanisms (ARCO-GP). The approach yields state-of-the-art structure learning performance on nonlinear additive noise models and competitive results on real data, while enabling accurate inference of interventional distributions and average causal effects through posterior over SCMs. By explicitly representing and marginalising causal structure, ARCO-GP provides principled uncertainty quantification for causal queries with practical implications for downstream decision making.

Abstract

The traditional two-stage approach to causal inference first identifies a single causal model (or equivalence class of models), which is then used to answer causal queries. However, this neglects any epistemic model uncertainty. In contrast, Bayesian causal inference does incorporate epistemic uncertainty into query estimates via Bayesian marginalisation (posterior averaging) over all causal models. While principled, this marginalisation over entire causal models, i.e., both causal structures (graphs) and mechanisms, poses a tremendous computational challenge. In this work, we address this challenge by decomposing structure marginalisation into the marginalisation over (i) causal orders and (ii) directed acyclic graphs (DAGs) given an order. We can marginalise the latter in closed form by limiting the number of parents per variable and utilising Gaussian processes to model mechanisms. To marginalise over orders, we use a sampling-based approximation, for which we devise a novel auto-regressive distribution over causal orders (ARCO). Our method outperforms state-of-the-art in structure learning on simulated non-linear additive noise benchmarks, and yields competitive results on real-world data. Furthermore, we can accurately infer interventional distributions and average causal effects.

Effective Bayesian Causal Inference via Structural Marginalisation and Autoregressive Orders

TL;DR

This work tackles epistemic uncertainty in causal inference by decomposing Bayesian causal marginalisation into two tractable steps: marginalisation over causal orders and marginalisation over DAGs given an order. It introduces ARCO, a gradient-friendly auto-regressive model over causal orders, and couples it with exact DAG marginalisation under a maximum parent set size using Gaussian-process mechanisms (ARCO-GP). The approach yields state-of-the-art structure learning performance on nonlinear additive noise models and competitive results on real data, while enabling accurate inference of interventional distributions and average causal effects through posterior over SCMs. By explicitly representing and marginalising causal structure, ARCO-GP provides principled uncertainty quantification for causal queries with practical implications for downstream decision making.

Abstract

The traditional two-stage approach to causal inference first identifies a single causal model (or equivalence class of models), which is then used to answer causal queries. However, this neglects any epistemic model uncertainty. In contrast, Bayesian causal inference does incorporate epistemic uncertainty into query estimates via Bayesian marginalisation (posterior averaging) over all causal models. While principled, this marginalisation over entire causal models, i.e., both causal structures (graphs) and mechanisms, poses a tremendous computational challenge. In this work, we address this challenge by decomposing structure marginalisation into the marginalisation over (i) causal orders and (ii) directed acyclic graphs (DAGs) given an order. We can marginalise the latter in closed form by limiting the number of parents per variable and utilising Gaussian processes to model mechanisms. To marginalise over orders, we use a sampling-based approximation, for which we devise a novel auto-regressive distribution over causal orders (ARCO). Our method outperforms state-of-the-art in structure learning on simulated non-linear additive noise benchmarks, and yields competitive results on real-world data. Furthermore, we can accurately infer interventional distributions and average causal effects.
Paper Structure (44 sections, 2 theorems, 37 equations, 4 figures, 11 tables, 2 algorithms)

This paper contains 44 sections, 2 theorems, 37 equations, 4 figures, 11 tables, 2 algorithms.

Key Result

Proposition 4.1

Let $Y(G) = \prod_i Y_i(\textbf{Pa}_i^G)$ and $w(G) = \prod_i w(\textbf{Pa}_i^G)$ be factorising over the parent sets, then

Figures (4)

  • Figure 1: Generative model of ARCO-GP. We characterise a Structural Causal Model (SCM) $\mathcal{M} = (G, \mathbf{f}, \bm{\psi})$ by a causal graph $G$, causal mechanisms $\mathbf{f}$ and parameters $\bm{\psi}$ of a joint distribution over mechanisms and exogenous variables $p(\mathbf{f}, \mathbf{U}\,|\, \bm{\psi})$. We model the mechanisms $\mathbf{f}$ using Gaussian Processes (GPs) and $\mathbf{U}$ as additive Gaussian noise, implying that $\bm{\psi}$ is a set of GP hyper-parameters. The SCM gives rise to the data-generating likelihood $p(\mathcal{D}\,|\, \mathbf{f}, \bm{\psi}, G)$ and determines the (distribution over the) causal query $Y$. To sample a SCM, we first sample a causal order $L$ from a neural auto-regressive distribution over causal orders (ARCO) $p(L\,|\, \bm{\theta})$ with parameters $\bm{\theta}$. Given a causal order and assuming a limited maximum cardinality of parent sets, we can then sample or marginalise causal graphs $G$ and mechanisms $\mathbf{f}$ in closed form.
  • Figure 2: Causal discovery on nonlinear additive noise models. Structure learning results in terms of expected Hamming distance (ESHD) and ancestor adjustment identification distance (A-AID) on simulated non-linear models with scale-free (left, blue) and Erdös-Renyi (right, orange) graphs, each with $20$ nodes and $200$ data samples. Whiskers indicate maximum, minimum and median values across $20$ simulated ground truth instances. For both metrics lower is better. Range for ESHD is set for better readability, omitting the result for DDS ($>125$).
  • Figure 3: Posterior interventional distributions. Several interventional distributions as inferred by ARCO-GP (red, solid) and the corresponding ground truth (blue, dashed). Specifically, we sampled full SCMs (orders, graphs given orders, mechanisms, exogeneous variables) and performed the indicated intervention to produce a sample from the corresponding distribution, which effectively marginalises over the posterior over SCMs. Vertical lines indicate the estimated distribution means (average causal effects). See \ref{['app:exp-setup']} for details.
  • Figure 4: Sachs Graph. Consensus protein interaction graph from Sachs2005. We relabeled nodes to avoid misinterpretation of our simulation results. Nodes X0 to X10 correspond to the original labels ['PKC', 'PKA', 'Jnk', 'P38', 'Raf', 'Mek', 'Erk', 'Akt', 'Plcg', 'PIP3', 'PIP2'].

Theorems & Definitions (4)

  • Proposition 4.1
  • Proposition 4.2
  • proof
  • proof