Bayesian score calibration for approximate models
Joshua J Bon, David J Warne, David J Nott, Christopher Drovandi
TL;DR
This work tackles the challenge of performing Bayesian inference when the target model has an intractable likelihood but simulatable dynamics. It introduces Bayesian score calibration, which learns a data-aware transformation of an inexpensive approximate posterior by maximizing a strictly proper scoring rule, notably the energy score, over a small number of simulated calibration datasets. Theoretical justification (via a general SBI result) guarantees that, with a sufficiently rich family of pushforward transformations, the calibrated posterior can recover the true posterior conditional on simulated data; a practical diagnostic assesses calibration quality. Empirically, the method reduces bias and improves posterior coverage across OU, Lotka–Volterra, and MAPK-like models, while remaining computationally efficient and scalable, since the expensive target-model evaluations are limited to a modest number of simulations. This framework offers a flexible, post-hoc correction for surrogate Bayesian inferences and supports broader use with various surrogate likelihoods and approximate inference techniques.
Abstract
Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with a surrogate, but this can result in a posterior with poor uncertainty quantification. In this paper, we propose a new method for adjusting approximate posterior samples to reduce bias and improve posterior coverage properties. We do this by optimizing a transformation of the approximate posterior, the result of which maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We develop supporting theory for our method and demonstrate beneficial corrections to approximate posteriors across several examples of increasing complexity.
