Table of Contents
Fetching ...

Simulation-based stacking

Yuling Yao, Bruno Régaldo-Saint Blancard, Justin Domke

TL;DR

This work tackles posterior miscalibration and non-mixing in simulation-based inference by introducing a general, theory-backed stacking framework that aggregates multiple SBI posteriors. It offers five concrete aggregation forms—density, sample, interval, rank, and moment stacking—each paired with tailored objective functions (KL-based log density, rank-based divergence, interval coverage, and moment matching) and proven asymptotic optimality under proper scoring rules. The unified framework accommodates hybrid stacking, combining multiple objectives to exploit complementary information, and provides practical guidance for training, validation, and computation. Empirical results on SBI benchmarks and a cosmological inference task demonstrate that stacking improves KL closeness to the true posterior, calibration of ranks and intervals, and accuracy of posterior moments, often with reduced computational burden. The approach has broad relevance for SBI practice and multi-run Bayesian computation, offering a principled path to robust, calibrated inference across complex scientific domains.

Abstract

Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a consistency guarantee, we present a general posterior stacking framework to make use of all available approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias of the posterior approximation at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.

Simulation-based stacking

TL;DR

This work tackles posterior miscalibration and non-mixing in simulation-based inference by introducing a general, theory-backed stacking framework that aggregates multiple SBI posteriors. It offers five concrete aggregation forms—density, sample, interval, rank, and moment stacking—each paired with tailored objective functions (KL-based log density, rank-based divergence, interval coverage, and moment matching) and proven asymptotic optimality under proper scoring rules. The unified framework accommodates hybrid stacking, combining multiple objectives to exploit complementary information, and provides practical guidance for training, validation, and computation. Empirical results on SBI benchmarks and a cosmological inference task demonstrate that stacking improves KL closeness to the true posterior, calibration of ranks and intervals, and accuracy of posterior moments, often with reduced computational burden. The approach has broad relevance for SBI practice and multi-run Bayesian computation, offering a principled path to robust, calibrated inference across complex scientific domains.

Abstract

Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a consistency guarantee, we present a general posterior stacking framework to make use of all available approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias of the posterior approximation at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.
Paper Structure (47 sections, 12 theorems, 66 equations, 7 figures, 4 tables)

This paper contains 47 sections, 12 theorems, 66 equations, 7 figures, 4 tables.

Key Result

Proposition 1

If the score $U$ is proper, then for any $\epsilon>0$ and any given $\mathop{\mathrm{\mathbf{w}}}\nolimits^{\prime}$, as $N\to \infty$, $\Pr \left( \mathop{\mathrm{\mathbb{E}}}\nolimits_{p(y, \theta)} U(q_{\hat{\mathop{\mathrm{\mathbf{w}}}\nolimits}}^*(\cdot | y), \theta) \leq \mathop{\mathrm{\ma

Figures (7)

  • Figure 1: We run 1,000 neural posterior inferences in a challenging cosmology model. The rank histograms of one parameter reveal different types of miscalibration in four runs. The expected log densities of the 1,000 inferences vary by 1.7 nats, while stacking from this paper improves the best approximation by 1.4 nats on holdout simulations.
  • Figure 2: Stacking has two stages. The first is to sample $N$ prior simulations and run $K$ inferences methods, $q_k( \cdot|y), 1\leq k \leq K$. The second stage learns a stacked posterior $q^*(\cdot |y)$ to better approximate the true posterior $p(\theta|y)$. We design stacking to facilitate different distribution combination forms and learning objectives.
  • Figure 3: Tension among objectives: Four approximate inferences have the same KL divergence to the true posterior, but differ a lot in the bias, coverage, and rank calibration.
  • Figure 4: The KL divergence between true posterior to the stacked posterior as training size $N$ varies.
  • Figure 5: Examples of true and one inferred posteriors in four models. We visualize two margins of the parameter.
  • ...and 2 more figures

Theorems & Definitions (18)

  • Proposition 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Lemma 1
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • ...and 8 more