Simulation-based stacking
Yuling Yao, Bruno Régaldo-Saint Blancard, Justin Domke
TL;DR
This work tackles posterior miscalibration and non-mixing in simulation-based inference by introducing a general, theory-backed stacking framework that aggregates multiple SBI posteriors. It offers five concrete aggregation forms—density, sample, interval, rank, and moment stacking—each paired with tailored objective functions (KL-based log density, rank-based divergence, interval coverage, and moment matching) and proven asymptotic optimality under proper scoring rules. The unified framework accommodates hybrid stacking, combining multiple objectives to exploit complementary information, and provides practical guidance for training, validation, and computation. Empirical results on SBI benchmarks and a cosmological inference task demonstrate that stacking improves KL closeness to the true posterior, calibration of ranks and intervals, and accuracy of posterior moments, often with reduced computational burden. The approach has broad relevance for SBI practice and multi-run Bayesian computation, offering a principled path to robust, calibrated inference across complex scientific domains.
Abstract
Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a consistency guarantee, we present a general posterior stacking framework to make use of all available approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias of the posterior approximation at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.
