Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

Marvin Schmitt; Desi R. Ivanova; Daniel Habermann; Ullrich Köthe; Paul-Christian Bürkner; Stefan T. Radev

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

Marvin Schmitt, Desi R. Ivanova, Daniel Habermann, Ullrich Köthe, Paul-Christian Bürkner, Stefan T. Radev

TL;DR

This work addresses data efficiency in amortized Bayesian inference by exploiting a symmetry of the joint model $p(\boldsymbol{\theta}, \mathbf{Y})$ under Bayes' rule. The authors invert Bayes' theorem to monitor the marginal likelihood and introduce a self-consistency loss that penalizes variance of $\log(p(\boldsymbol{\theta}) p(\mathbf{Y}|\boldsymbol{\theta}) / q_{\boldsymbol{\phi}}(\boldsymbol{\theta}|\mathbf{Y}))$ across parameter samples, thereby improving neural posterior (NPE) and neural posterior-likelihood (NPLE) estimators, particularly in low-data regimes. The method integrates with both explicit and learned likelihoods (via $q_{\boldsymbol{\eta}}(\mathbf{Y}|\boldsymbol{\theta})$), and empirical results across Gaussian mixtures, two moons, Hes1 biology, source localization, and high-dimensional time series demonstrate sharper marginal-likelihood estimates, better calibrated posteriors, and more accurate posterior samples. The approach offers data-efficient augmentation to SBI/ABI pipelines and can extend to sequential SBI and other density estimators, albeit with added training cost and reliance on tractable prior densities.

Abstract

We propose a method to improve the efficiency and accuracy of amortized Bayesian inference by leveraging universal symmetries in the joint probabilistic model of parameters and data. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, errors in approximate inference lead to undesirable variance in the marginal likelihood estimates across different parameter values. We penalize violations of this symmetry with a \textit{self-consistency loss} which significantly improves the quality of approximate inference in low data regimes and can be used to augment the training of popular neural density estimators. We apply our method to a number of synthetic problems and realistic scientific models, discovering notable advantages in the context of both neural posterior and likelihood approximation.

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

TL;DR

This work addresses data efficiency in amortized Bayesian inference by exploiting a symmetry of the joint model

under Bayes' rule. The authors invert Bayes' theorem to monitor the marginal likelihood and introduce a self-consistency loss that penalizes variance of

across parameter samples, thereby improving neural posterior (NPE) and neural posterior-likelihood (NPLE) estimators, particularly in low-data regimes. The method integrates with both explicit and learned likelihoods (via

), and empirical results across Gaussian mixtures, two moons, Hes1 biology, source localization, and high-dimensional time series demonstrate sharper marginal-likelihood estimates, better calibrated posteriors, and more accurate posterior samples. The approach offers data-efficient augmentation to SBI/ABI pipelines and can extend to sequential SBI and other density estimators, albeit with added training cost and reliance on tractable prior densities.

Abstract

Paper Structure (30 sections, 1 theorem, 24 equations, 11 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 24 equations, 11 figures, 4 tables, 1 algorithm.

Introduction
Background
Notation
Neural Posterior Estimation
Neural Posterior and Likelihood Estimation
Limitations of NPE and NPLE
Leveraging Self-Consistency for ABI
Naïve Approach: Direct Constrained Optimization
Variance Penalty and Self-Consistency Loss
Monte Carlo Estimation
Intuition for Benefits of Self-Consistency
Related Work
Empirical Evaluation
Experiment 1: Gaussian Mixture Model
Experiment 2: Two Moons
...and 15 more sections

Key Result

Proposition 1

Let $\pi(\boldsymbol{\theta})$ be any proposal distribution with the same support as $p(\boldsymbol{\theta} \,|\, \mathbf{Y})$, $\mathbf{Y}$ be a fixed data set, and $f$ be any monotonic function, then

Figures (11)

Figure 1: The performance of the posterior approximator is evaluated via the variance of the corresponding marginal likelihood estimates. Top row: For the true posterior (or a perfect approximation thereof), the estimated marginal likelihood is constant for any parameter value ${\boldsymbol{\theta}\sim \pi(\boldsymbol{\theta})}$. Bottom row: For an imperfect approximate posterior, the estimated marginal likelihood varies across different parameter values. Hence, the inherent symmetry of the joint probabilistic model $p(\boldsymbol{\theta}, \mathbf{Y})$ is violated by its approximate representation. Minimizing the variance of the marginal likelihood estimates pushes the estimated marginal likelihood towards uniformity. This restores the symmetry of the unified representation, which is equivalent to improving the approximate posterior.
Figure 2: Experiment 1 (Gaussian Mixture Model). Performance comparison between the NPE baseline (A) and our self-consistent SC-NPE method (B). Pink star $\boldsymbol{\star}$ marks the ground-truth parameter $\boldsymbol{\theta}^*$. Both qualitative assessments (sampling) and quantitative measures (MMD; lower is better) indicate that the our SC-NPE method yields significantly better results given the same neural architecture and training budget. Across all simulation budgets, our self-consistent approximator outperforms the NPE baseline, as indexed by improved posterior fidelity (lower MMD) on 100 unseen test instances (C).
Figure 3: Experiment 2 (Two Moons). Our self-consistency loss yields a lower posterior error (MMD) than the baseline NPLE algorithm on the test set with equal architecture. We repeat the experiment 5 times on the same training set; plots show the median, best, and worst run.
Figure 4: Experiment 3 (Hes1 Expression). The baseline NPLE approximator shows deficient simulation-based calibration, as indexed by ECDF lines outside the gray 95% confidence bands (A). In contrast, our self-consistent approximator is well-calibrated (C). Samples from the posterior predictive distribution on real experimental data Silk2011 are comparable between NPLE (B) and SC (D).
Figure 5: Experiment 1 (Gaussian mixture model). All approximators are well-calibrated.
...and 6 more figures

Theorems & Definitions (2)

Proposition 1
proof

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

TL;DR

Abstract

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)