Table of Contents
Fetching ...

Bayesian score calibration for approximate models

Joshua J Bon, David J Warne, David J Nott, Christopher Drovandi

TL;DR

This work tackles the challenge of performing Bayesian inference when the target model has an intractable likelihood but simulatable dynamics. It introduces Bayesian score calibration, which learns a data-aware transformation of an inexpensive approximate posterior by maximizing a strictly proper scoring rule, notably the energy score, over a small number of simulated calibration datasets. Theoretical justification (via a general SBI result) guarantees that, with a sufficiently rich family of pushforward transformations, the calibrated posterior can recover the true posterior conditional on simulated data; a practical diagnostic assesses calibration quality. Empirically, the method reduces bias and improves posterior coverage across OU, Lotka–Volterra, and MAPK-like models, while remaining computationally efficient and scalable, since the expensive target-model evaluations are limited to a modest number of simulations. This framework offers a flexible, post-hoc correction for surrogate Bayesian inferences and supports broader use with various surrogate likelihoods and approximate inference techniques.

Abstract

Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with a surrogate, but this can result in a posterior with poor uncertainty quantification. In this paper, we propose a new method for adjusting approximate posterior samples to reduce bias and improve posterior coverage properties. We do this by optimizing a transformation of the approximate posterior, the result of which maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We develop supporting theory for our method and demonstrate beneficial corrections to approximate posteriors across several examples of increasing complexity.

Bayesian score calibration for approximate models

TL;DR

This work tackles the challenge of performing Bayesian inference when the target model has an intractable likelihood but simulatable dynamics. It introduces Bayesian score calibration, which learns a data-aware transformation of an inexpensive approximate posterior by maximizing a strictly proper scoring rule, notably the energy score, over a small number of simulated calibration datasets. Theoretical justification (via a general SBI result) guarantees that, with a sufficiently rich family of pushforward transformations, the calibrated posterior can recover the true posterior conditional on simulated data; a practical diagnostic assesses calibration quality. Empirically, the method reduces bias and improves posterior coverage across OU, Lotka–Volterra, and MAPK-like models, while remaining computationally efficient and scalable, since the expensive target-model evaluations are limited to a modest number of simulations. This framework offers a flexible, post-hoc correction for surrogate Bayesian inferences and supports broader use with various surrogate likelihoods and approximate inference techniques.

Abstract

Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with a surrogate, but this can result in a posterior with poor uncertainty quantification. In this paper, we propose a new method for adjusting approximate posterior samples to reduce bias and improve posterior coverage properties. We do this by optimizing a transformation of the approximate posterior, the result of which maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We develop supporting theory for our method and demonstrate beneficial corrections to approximate posteriors across several examples of increasing complexity.
Paper Structure (31 sections, 3 theorems, 38 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 3 theorems, 38 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 2

Consider a strictly proper scoring rule $S$ relative to the class of distributions $\mathcal{P}$, and importance distribution$\bar{\Pi}$ with Radon–Nikodym derivative $r = \mathrm{d} \Pi / \mathrm{d} \bar{\Pi}$ where $\Pi$ is the prior. Let $v:\mathsf{Y} \rightarrow [0,\infty)$ and define $Q$ by cha If the family of kernels $\mathcal{K}$ is sufficiently rich with respect to $(Q,\mathcal{P})$ then

Figures (10)

  • Figure 1: Graphical overview of Bayesian score calibration. Firstly, the importance distribution $\bar{\Pi}$ and data-generating process $P(\,\cdot \mid \theta)$ simulate parameter-data pairs $(\bar{\theta}^{(m)}, \tilde{y}^{(m)})$ for $m \in \{1,\ldots,M\}$. Each simulated data set, $\tilde{y}^{(m)}$, defines a new approximate posterior, $\hat{\Pi}(\,\cdot \mid \tilde{y}^{(m)})$, which we approximate with Monte Carlo samples. Secondly, we use a strictly proper scoring rule $S$ to find the best transformation of the approximate posterior, defining the pushforward distribution $f_{\sharp}\hat{\Pi}(\,\cdot \mid \tilde{y})$, with respect to true data-generating parameter $\bar{\theta}$, averaged over $\bar{\Pi}$, with weights $w(\bar{\theta}, \tilde{y})$. The optimization objective function is approximated using Monte Carlo with pre-computed samples from the generation step. Finally, the optimal function, $f^\star$, is used to generate samples from the adjusted approximate posterior with the observed data, $y$, and produce data for diagnostic summaries.
  • Figure 2: Univariate densities estimates of approximations to the OU Process model posterior distribution from a single simulation. The original approximate posterior (Approx-post) and adjusted posteriors (Adjust-post) with $(\alpha)$ clipping are shown with solid lines. The true posterior (True-post) is shown with a dashed line. The true generating parameter value is indicated with a cross $(\times)$.
  • Figure 3: Posterior summaries of the bivariate OU Process model from a single simulation. Plot (a) shows 50% and 90% credible region probability (CR Prob) contours from a Gaussian approximation to the bivariate density of $\rho$ and $D$. The original approximate posterior (Approx-post) and adjusted posteriors (Adjust-post) with $(\alpha)$ clipping are shown with solid lines. The true posterior (True-post) is shown with a dashed line. The true generating parameter value is indicated with a cross $(\times)$. Plot (b) is a calibration diagnostic showing the marginal miscoverage for all parameters (see Section \ref{['sec:calcheck']}) for $\alpha = 1$ with $\pm 0.1$ deviation from parity shown with a dotted line.
  • Figure 4: Comparison of original approximate posterior (Approx-post) and adjusted posteriors (Adjust-post) for Lotka-Volterra example with EKF likelihood. Plot (a) shows the estimated marginal posterior densities, with true generating parameter value indicated with a cross $(\times)$. Plot (b) is a calibration diagnostic showing the marginal miscoverage for all parameters with $\pm 0.1$ deviation from parity shown with a dotted line.
  • Figure 5: Comparison of original approximate posterior (Approx-post) and adjusted posteriors (Adjust-post) for reaction network example with EKF likelihood. Plot (a) shows the estimated marginal posterior densities, with true generating parameter value indicated with a cross $(\times)$. Plot (b) is a calibration diagnostic showing the marginal miscoverage for all parameters with $\pm 0.1$ deviation from parity shown with a dotted line.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Definition 1: Sufficiently rich kernel family
  • Theorem 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Theorem 6
  • Theorem 7
  • Remark 8
  • Remark 9
  • Remark 10