Tests for model misspecification in simulation-based inference: from local distortions to global model checks
Noemi Anau Montel, James Alvey, Christoph Weniger
TL;DR
This work tackles misspecification in simulation-based inference (SBI) by introducing distortion-driven tests that treat a base simulator as H0 and a broad ensemble of augmented simulators as Hi. Central to the approach are localized test statistics t_i(x) = -2 \ln \frac{p(x|H0)}{p(x|Hi)} and their aggregate t_{sum}(x) = \sum_i t_i(x), with global p-values computed via Monte Carlo sampling to account for multiple correlated tests. The authors present two training strategies—BCE (classifier-based) and SNR (matched-filter-based)—to efficiently learn these statistics, and demonstrate connections to classical frameworks (matched filtering, $\chi^2$ goodness-of-fit). They validate the framework on a toy example and apply it to GW150914, showing no significant misspecification while providing a rich diagnostic tool for end-to-end SBI analyses. An adaptive, self-calibrating distortions algorithm further enhances practical applicability by tuning distortion amplitudes to remain plausible given observational noise. The approach offers a flexible, principled path toward robust SBI pipelines in physics and astrophysics, enabling thorough discrepancy detection and interpretation beyond parameter estimation.
Abstract
Model misspecification analysis strategies, such as anomaly detection, model validation, and model comparison are a key component of scientific model development. Over the last few years, there has been a rapid rise in the use of simulation-based inference (SBI) techniques for Bayesian parameter estimation, applied to increasingly complex forward models. To move towards fully simulation-based analysis pipelines, however, there is an urgent need for a comprehensive simulation-based framework for model misspecification analysis. In this work, we provide a solid and flexible foundation for a wide range of model discrepancy analysis tasks, using distortion-driven model misspecification tests. From a theoretical perspective, we introduce the statistical framework built around performing many hypothesis tests for distortions of the simulation model. We also make explicit analytic connections to classical techniques: anomaly detection, model validation, and goodness-of-fit residual analysis. Furthermore, we introduce an efficient self-calibrating training algorithm that is useful for practitioners. We demonstrate the performance of the framework in multiple scenarios, making the connection to classical results where they are valid. Finally, we show how to conduct such a distortion-driven model misspecification test for real gravitational wave data, specifically on the event GW150914.
