Table of Contents
Fetching ...

Validating Bayesian Inference Algorithms with Simulation-Based Calibration

Sean Talts, Michael Betancourt, Daniel Simpson, Aki Vehtari, Andrew Gelman

TL;DR

<3-5 sentences high-level summary>Simulation-Based Calibration (SBC) provides a general, ground-truth–free framework to validate Bayesian inference algorithms by leveraging the Bayesian joint distribution $\pi(y,\theta)=\pi(y|\theta)\pi(\theta)$ and the posterior $\pi(\theta|\tilde{y})$. The method uses rank statistics derived from prior draws and corresponding posteriors across many simulated replications to test whether the data-averaged posterior is calibrated to the prior, with uniform rank distributions indicating correctness. Deviations from uniformity reveal issues such as autocorrelation, mis-specification, or biases in algorithms like MCMC, ADVI, and INLA, and the paper demonstrates this through misspecified priors, biased MCMC, and spatial modeling examples. SBC is computationally intensive but highly parallelizable and serves as a valuable component of a robust Bayesian workflow alongside posterior predictive checks.

Abstract

Verifying the correctness of Bayesian computation is challenging. This is especially true for complex models that are common in practice, as these require sophisticated model implementations and algorithms. In this paper we introduce \emph{simulation-based calibration} (SBC), a general procedure for validating inferences from Bayesian algorithms capable of generating posterior samples. This procedure not only identifies inaccurate computation and inconsistencies in model implementations but also provides graphical summaries that can indicate the nature of the problems that arise. We argue that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.

Validating Bayesian Inference Algorithms with Simulation-Based Calibration

TL;DR

<3-5 sentences high-level summary>Simulation-Based Calibration (SBC) provides a general, ground-truth–free framework to validate Bayesian inference algorithms by leveraging the Bayesian joint distribution and the posterior . The method uses rank statistics derived from prior draws and corresponding posteriors across many simulated replications to test whether the data-averaged posterior is calibrated to the prior, with uniform rank distributions indicating correctness. Deviations from uniformity reveal issues such as autocorrelation, mis-specification, or biases in algorithms like MCMC, ADVI, and INLA, and the paper demonstrates this through misspecified priors, biased MCMC, and spatial modeling examples. SBC is computationally intensive but highly parallelizable and serves as a valuable component of a robust Bayesian workflow alongside posterior predictive checks.

Abstract

Verifying the correctness of Bayesian computation is challenging. This is especially true for complex models that are common in practice, as these require sophisticated model implementations and algorithms. In this paper we introduce \emph{simulation-based calibration} (SBC), a general procedure for validating inferences from Bayesian algorithms capable of generating posterior samples. This procedure not only identifies inaccurate computation and inconsistencies in model implementations but also provides graphical summaries that can indicate the nature of the problems that arise. We argue that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.

Paper Structure

This paper contains 18 sections, 2 theorems, 20 equations, 13 figures, 2 algorithms.

Key Result

Theorem 1

Let $\tilde{\theta} \sim \pi(\theta)$, $\tilde{y} \sim \pi(y \mid \tilde{\theta})$, and $\left\{ \theta_{1}, \ldots, \theta_{L} \right\} \sim \pi(\theta \mid \tilde{y})$ for any joint distribution $\pi(y, \theta)$. The rank statistic of any one-dimensional random variable over $\theta$ is uniformly

Figures (13)

  • Figure 1: The procedure of Cook2006-se applied to a linear regression analysis with Stan indicates significant problems despite the analysis itself being correct. In particular, the histogram of estimated CDF values (red) exhibits strong systematic deviations from the variation expected of a uniform histogram (gray).
  • Figure 2: SBC Algorithm \ref{['algo:sbc_mcmc']} applied to a linear regression analysis indicates no issues as the empirical rank statistics (red) are consistent with the variation expected of a uniform histogram (gray).
  • Figure 3: Uniformly distributed rank statistics are consistent with the ranks being computed from independent samples from the exact posterior of a correctly specified model.
  • Figure 4: The spikes at the boundaries of the SBC histogram indicate that posterior samples possess non-negligible autocorrelation.
  • Figure 5: A symmetric, $\cap$-shaped distribution indicates that the computed data-averaged posterior distribution (dark red) is overdispersed relative to the prior distribution (light red). This implies that on average the computed posterior will be wider than the true posterior.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof