Table of Contents
Fetching ...

Causal Posterior Estimation

Simon Dirmeier, Antonietta Mira

TL;DR

CPE tackles SBI for simulator-based models with intractable likelihoods by embedding the model's conditional dependencies directly into normalizing-flow architectures. It introduces continuous and discrete NF variants, a prior-aligned base distribution, and a rectified, constant-time sampling scheme that yields accurate posterior inferences with high sampling efficiency. By leveraging causal factorization, block matrix projections, and time/data conditioning, CPE outperforms or matches state-of-the-art baselines across nine SBI benchmarks while using fewer trainable parameters. The approach connects to structured semiseparable matrices for computational efficiency on accelerators and offers a pathway for scalable, exact-like posterior estimation in complex graphical models. Overall, CPE advances simulator-based Bayesian inference by tightly integrating graphical model structure into neural posterior estimation and enabling fast, accurate sampling.

Abstract

We present Causal Posterior Estimation (CPE), a novel method for Bayesian inference in simulator models, i.e., models where the evaluation of the likelihood function is intractable or too computationally expensive, but where one can simulate model outputs given parameter values. CPE utilizes a normalizing flow-based (NF) approximation to the posterior distribution which carefully incorporates the conditional dependence structure induced by the graphical representation of the model into the neural network. Thereby it is possible to improve the accuracy of the approximation. We introduce both discrete and continuous NF architectures for CPE and propose a constant-time sampling procedure for the continuous case which reduces the computational complexity of drawing samples to O(1) as for discrete NFs. We show, through an extensive experimental evaluation, that by incorporating the conditional dependencies induced by the graphical model directly into the neural network, rather than learning them from data, CPE is able to conduct highly accurate posterior inference either outperforming or matching the state of the art in the field.

Causal Posterior Estimation

TL;DR

CPE tackles SBI for simulator-based models with intractable likelihoods by embedding the model's conditional dependencies directly into normalizing-flow architectures. It introduces continuous and discrete NF variants, a prior-aligned base distribution, and a rectified, constant-time sampling scheme that yields accurate posterior inferences with high sampling efficiency. By leveraging causal factorization, block matrix projections, and time/data conditioning, CPE outperforms or matches state-of-the-art baselines across nine SBI benchmarks while using fewer trainable parameters. The approach connects to structured semiseparable matrices for computational efficiency on accelerators and offers a pathway for scalable, exact-like posterior estimation in complex graphical models. Overall, CPE advances simulator-based Bayesian inference by tightly integrating graphical model structure into neural posterior estimation and enabling fast, accurate sampling.

Abstract

We present Causal Posterior Estimation (CPE), a novel method for Bayesian inference in simulator models, i.e., models where the evaluation of the likelihood function is intractable or too computationally expensive, but where one can simulate model outputs given parameter values. CPE utilizes a normalizing flow-based (NF) approximation to the posterior distribution which carefully incorporates the conditional dependence structure induced by the graphical representation of the model into the neural network. Thereby it is possible to improve the accuracy of the approximation. We introduce both discrete and continuous NF architectures for CPE and propose a constant-time sampling procedure for the continuous case which reduces the computational complexity of drawing samples to O(1) as for discrete NFs. We show, through an extensive experimental evaluation, that by incorporating the conditional dependencies induced by the graphical model directly into the neural network, rather than learning them from data, CPE is able to conduct highly accurate posterior inference either outperforming or matching the state of the art in the field.

Paper Structure

This paper contains 50 sections, 1 theorem, 49 equations, 8 figures.

Key Result

Theorem 1

Given vector fields $u_t(\theta|{\theta^{(1)}})$ that generate conditional probability paths $\varrho_t(\theta|{\theta^{(1)}})$, for any distribution $\pi({\theta^{(1)}})$, the marginal vector field $u_t$ in Equation app:marginal-vector-field generates the marginal probability path $\varrho_t$ in Eq

Figures (8)

  • Figure 1: Graphical model of Equation \ref{['eqn:hierarchical-model']}.
  • Figure 2: CPE and baseline performance using a H-min metric (smaller values are better). CPE-RK denotes the CPE variant that uses a Runge-Kutta 5(4) solver while CPE-Euler uses a $20$-step Euler solver.
  • Figure 3: Sampling acceptance rates of all methods. CPE has consistently high acceptance rates when drawing a posterior sample which reduces down the total number of samples required to be drawn. Results from all experimental evaluations (Figure \ref{['fig:benchmark_tasks-hmin']}) are pooled.
  • Figure 4: Continuous-time CPE architecture. Data $x$ and time $t$ are proprocessed using an MLP and a Fourier embedding + MLP, respectively, concatenated and used as conditioning variables. We condition after a block transform before applying an activation function (Equation \ref{['eqn:linear-projection']}). While it is possible to do the conditioning after each block transform (dashed lines), here, we only do it after the first projection. In the end, to account for the prior program, we apply a convex combination (Equation \ref{['eqn:convex-combination']}).
  • Figure 5: CPE and baseline performance using a C2ST metric (smaller values are better, 0.5 is best) when trained on a data set of size $10\ 000$. CPE-RK denotes the CPE variant that uses a Runge-Kutta 5(4) solver while CPE-Euler uses a $20$-step Euler solver.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof