Effective Bayesian Causal Inference via Structural Marginalisation and Autoregressive Orders
Christian Toth, Christian Knoll, Franz Pernkopf, Robert Peharz
TL;DR
This work tackles epistemic uncertainty in causal inference by decomposing Bayesian causal marginalisation into two tractable steps: marginalisation over causal orders and marginalisation over DAGs given an order. It introduces ARCO, a gradient-friendly auto-regressive model over causal orders, and couples it with exact DAG marginalisation under a maximum parent set size using Gaussian-process mechanisms (ARCO-GP). The approach yields state-of-the-art structure learning performance on nonlinear additive noise models and competitive results on real data, while enabling accurate inference of interventional distributions and average causal effects through posterior over SCMs. By explicitly representing and marginalising causal structure, ARCO-GP provides principled uncertainty quantification for causal queries with practical implications for downstream decision making.
Abstract
The traditional two-stage approach to causal inference first identifies a single causal model (or equivalence class of models), which is then used to answer causal queries. However, this neglects any epistemic model uncertainty. In contrast, Bayesian causal inference does incorporate epistemic uncertainty into query estimates via Bayesian marginalisation (posterior averaging) over all causal models. While principled, this marginalisation over entire causal models, i.e., both causal structures (graphs) and mechanisms, poses a tremendous computational challenge. In this work, we address this challenge by decomposing structure marginalisation into the marginalisation over (i) causal orders and (ii) directed acyclic graphs (DAGs) given an order. We can marginalise the latter in closed form by limiting the number of parents per variable and utilising Gaussian processes to model mechanisms. To marginalise over orders, we use a sampling-based approximation, for which we devise a novel auto-regressive distribution over causal orders (ARCO). Our method outperforms state-of-the-art in structure learning on simulated non-linear additive noise benchmarks, and yields competitive results on real-world data. Furthermore, we can accurately infer interventional distributions and average causal effects.
