Table of Contents
Fetching ...

Simulation-based Bayesian inference under model misspecification

Ryan P. Kelly, David J. Warne, David T. Frazier, David J. Nott, Michael U. Gutmann, Christopher Drovandi

TL;DR

This paper addresses the pervasive issue of model misspecification in simulation-based Bayesian inference (SBI) and surveys strategies to achieve robustness when the data-generating process cannot be perfectly captured. It organizes the discussion around three robust approaches—robust summary statistics, generalised Bayesian inference (GBI) with robust loss functions, and explicit error modelling with adjustment parameters—and demonstrates their impact using a misspecified MA(1) running example. The authors connect existing SBI methods (ABC, BSL, NCDE) to misspecification theory, highlight how each responds to misspecification, and show that robust methods can recover meaningful inference and improve predictive checks. The discussion emphasizes a principled Bayesian workflow with model checking, practical diagnostics, and future directions for theory, benchmarks, and scalable robust neural SBI methods.

Abstract

Simulation-based Bayesian inference (SBI) methods are widely used for parameter estimation in complex models where evaluating the likelihood is challenging but generating simulations is relatively straightforward. However, these methods commonly assume that the simulation model accurately reflects the true data-generating process, an assumption that is frequently violated in realistic scenarios. In this paper, we focus on the challenges faced by SBI methods under model misspecification. We consolidate recent research aimed at mitigating the effects of misspecification, highlighting three key strategies: i) robust summary statistics, ii) generalised Bayesian inference, and iii) error modelling and adjustment parameters. To illustrate both the vulnerabilities of popular SBI methods and the effectiveness of misspecification-robust alternatives, we present empirical results on an illustrative example.

Simulation-based Bayesian inference under model misspecification

TL;DR

This paper addresses the pervasive issue of model misspecification in simulation-based Bayesian inference (SBI) and surveys strategies to achieve robustness when the data-generating process cannot be perfectly captured. It organizes the discussion around three robust approaches—robust summary statistics, generalised Bayesian inference (GBI) with robust loss functions, and explicit error modelling with adjustment parameters—and demonstrates their impact using a misspecified MA(1) running example. The authors connect existing SBI methods (ABC, BSL, NCDE) to misspecification theory, highlight how each responds to misspecification, and show that robust methods can recover meaningful inference and improve predictive checks. The discussion emphasizes a principled Bayesian workflow with model checking, practical diagnostics, and future directions for theory, benchmarks, and scalable robust neural SBI methods.

Abstract

Simulation-based Bayesian inference (SBI) methods are widely used for parameter estimation in complex models where evaluating the likelihood is challenging but generating simulations is relatively straightforward. However, these methods commonly assume that the simulation model accurately reflects the true data-generating process, an assumption that is frequently violated in realistic scenarios. In this paper, we focus on the challenges faced by SBI methods under model misspecification. We consolidate recent research aimed at mitigating the effects of misspecification, highlighting three key strategies: i) robust summary statistics, ii) generalised Bayesian inference, and iii) error modelling and adjustment parameters. To illustrate both the vulnerabilities of popular SBI methods and the effectiveness of misspecification-robust alternatives, we present empirical results on an illustrative example.

Paper Structure

This paper contains 18 sections, 23 equations, 8 figures, 3 algorithms.

Figures (8)

  • Figure 1: Comparison of full observed data (dashed line), and simulated data at the pseudo-true parameter $\theta=0$ (solid line).
  • Figure 2: Left: ABC posterior for $\theta$, with the pseudo-true value marked by a vertical dashed line at $\theta=0$. Right: Simulated summaries for the misspecified MA(1) example, showing observed summary ($\times$), simulated summaries (circles), and the binding function $b(\theta)$ (solid curve).
  • Figure 3: Left: BSL posterior for $\theta$ for the misspecified MA(1) example. Right: Posterior predictive summaries (circles), observed summary ($\times$), and the binding function $b(\theta)$ (solid curve).
  • Figure 4: Top row: NPE results. Left: Posterior for $\theta$, with the pseudo-true value at $\theta = 0$ indicated by a vertical dashed line. Right: Posterior predictive summaries (circles), observed summary ($\times$), and the binding function $b(\theta)$ (solid curve). Bottom row: Corresponding NLE results, with posterior on the left and posterior predictive checks on the right.
  • Figure 5: Approximate ABC posteriors under three discrepancy choices—Euclidean distance, KL divergence, and MMD. The dashed line marks the pseudo-true value at $\theta = 0$.
  • ...and 3 more figures