Table of Contents
Fetching ...

Minimum Distance Summaries for Robust Neural Posterior Estimation

Sherman Khoo, Dennis Prangle, Song Liu, Mark Beaumont

TL;DR

This paper tackles robustness in simulation-based inference by decoupling robustness from the amortized neural posterior estimator (NPE). It introduces minimum-distance summaries (MDS), which adapt test-time summaries by minimizing a robust discrepancy—implemented via maximum mean discrepancy (MMD)—between the summary-conditioned data distribution and the observed data, while keeping the pretrained NPE fixed. The method leverages random Fourier features to obtain a lightweight, model-free test-time adaptation, and an amortized decoder mean embedding to avoid expensive density estimation. The authors provide theoretical guarantees for robustness under misspecification and posterior consistency under correct specification, and demonstrate substantial robustness gains across Gaussian, time-series, and cryo-EM benchmarks with minimal overhead. Overall, MDS preserves amortization, enables post-hoc robustness, and offers a practical, scalable approach to robust SBI in real-world settings.

Abstract

Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

Minimum Distance Summaries for Robust Neural Posterior Estimation

TL;DR

This paper tackles robustness in simulation-based inference by decoupling robustness from the amortized neural posterior estimator (NPE). It introduces minimum-distance summaries (MDS), which adapt test-time summaries by minimizing a robust discrepancy—implemented via maximum mean discrepancy (MMD)—between the summary-conditioned data distribution and the observed data, while keeping the pretrained NPE fixed. The method leverages random Fourier features to obtain a lightweight, model-free test-time adaptation, and an amortized decoder mean embedding to avoid expensive density estimation. The authors provide theoretical guarantees for robustness under misspecification and posterior consistency under correct specification, and demonstrate substantial robustness gains across Gaussian, time-series, and cryo-EM benchmarks with minimal overhead. Overall, MDS preserves amortization, enables post-hoc robustness, and offers a practical, scalable approach to robust SBI in real-world settings.

Abstract

Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.
Paper Structure (47 sections, 2 theorems, 20 equations, 25 figures, 1 table, 1 algorithm)

This paper contains 47 sections, 2 theorems, 20 equations, 25 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Consider any $\mathbb{Q} \in \mathcal{P}(\mathcal{X})$ and $\mathbf{y} \in \mathcal{X}$, and let $\mathbb{Q}_\epsilon = (1-\epsilon) \mathbb{Q} + \epsilon \delta_\mathbf{y}$, for $\epsilon \in [0,1]$. Then, under the conditions of Appendix app:robustness assumptions:

Figures (25)

  • Figure 1: MDS for a bivariate Gaussian model with 20% of observations $\tilde{\mathbf{x}}_{1:100}$ contaminated by a shift of $8$ units. $\mathbf{s}_{\mathrm{ref}}, \tilde{\mathbf{s}}, \mathbf{s}^*$ are the oracle (uncontaminated) summary, test observation (contaminated) summary and adapted MDS summary respectively. Left: MDS aligns the decoder model $q_\omega$ with the contaminated observations $\tilde{\mathbf{x}}$ using a robust divergence. Middle: MDS is able to recover the oracle summary despite aligning on the data space. Right: MDS robustifies the NPE and the NPE is now able to recover the true parameters.
  • Figure 2: Bivariate Gaussian model with outlier contamination, comparing posterior samples against the groundtruth posterior distribution with MMD. Left: Increasing outlier magnitude. Right: Increasing contamination proportion.
  • Figure 3: Time-series models with increasing contamination proportion, measuring the RMSE against the true parameter, posterior predictive against uncontaminated observations with MMD, and distance of adapted summary to the uncontaminated (oracle) summary statistic. Top: Ornstein-Uhlenbeck process. Bottom: SIR model.
  • Figure 4: Cryo-EM inference. Left: RMSE against true parameter. Right: Posterior predictive against uncontaminated observations.
  • Figure 5: Left: Projected image with HSP90 model. Right: Gaussian noise contamination.
  • ...and 20 more figures

Theorems & Definitions (5)

  • Theorem 4.1
  • proof
  • Theorem 4.2
  • proof
  • Remark 1.1