Table of Contents
Fetching ...

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

Marvin Schmitt, Desi R. Ivanova, Daniel Habermann, Ullrich Köthe, Paul-Christian Bürkner, Stefan T. Radev

TL;DR

This work addresses data efficiency in amortized Bayesian inference by exploiting a symmetry of the joint model $p(\boldsymbol{\theta}, \mathbf{Y})$ under Bayes' rule. The authors invert Bayes' theorem to monitor the marginal likelihood and introduce a self-consistency loss that penalizes variance of $\log(p(\boldsymbol{\theta}) p(\mathbf{Y}|\boldsymbol{\theta}) / q_{\boldsymbol{\phi}}(\boldsymbol{\theta}|\mathbf{Y}))$ across parameter samples, thereby improving neural posterior (NPE) and neural posterior-likelihood (NPLE) estimators, particularly in low-data regimes. The method integrates with both explicit and learned likelihoods (via $q_{\boldsymbol{\eta}}(\mathbf{Y}|\boldsymbol{\theta})$), and empirical results across Gaussian mixtures, two moons, Hes1 biology, source localization, and high-dimensional time series demonstrate sharper marginal-likelihood estimates, better calibrated posteriors, and more accurate posterior samples. The approach offers data-efficient augmentation to SBI/ABI pipelines and can extend to sequential SBI and other density estimators, albeit with added training cost and reliance on tractable prior densities.

Abstract

We propose a method to improve the efficiency and accuracy of amortized Bayesian inference by leveraging universal symmetries in the joint probabilistic model of parameters and data. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, errors in approximate inference lead to undesirable variance in the marginal likelihood estimates across different parameter values. We penalize violations of this symmetry with a \textit{self-consistency loss} which significantly improves the quality of approximate inference in low data regimes and can be used to augment the training of popular neural density estimators. We apply our method to a number of synthetic problems and realistic scientific models, discovering notable advantages in the context of both neural posterior and likelihood approximation.

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

TL;DR

This work addresses data efficiency in amortized Bayesian inference by exploiting a symmetry of the joint model under Bayes' rule. The authors invert Bayes' theorem to monitor the marginal likelihood and introduce a self-consistency loss that penalizes variance of across parameter samples, thereby improving neural posterior (NPE) and neural posterior-likelihood (NPLE) estimators, particularly in low-data regimes. The method integrates with both explicit and learned likelihoods (via ), and empirical results across Gaussian mixtures, two moons, Hes1 biology, source localization, and high-dimensional time series demonstrate sharper marginal-likelihood estimates, better calibrated posteriors, and more accurate posterior samples. The approach offers data-efficient augmentation to SBI/ABI pipelines and can extend to sequential SBI and other density estimators, albeit with added training cost and reliance on tractable prior densities.

Abstract

We propose a method to improve the efficiency and accuracy of amortized Bayesian inference by leveraging universal symmetries in the joint probabilistic model of parameters and data. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, errors in approximate inference lead to undesirable variance in the marginal likelihood estimates across different parameter values. We penalize violations of this symmetry with a \textit{self-consistency loss} which significantly improves the quality of approximate inference in low data regimes and can be used to augment the training of popular neural density estimators. We apply our method to a number of synthetic problems and realistic scientific models, discovering notable advantages in the context of both neural posterior and likelihood approximation.
Paper Structure (30 sections, 1 theorem, 24 equations, 11 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 24 equations, 11 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Let $\pi(\boldsymbol{\theta})$ be any proposal distribution with the same support as $p(\boldsymbol{\theta} \,|\, \mathbf{Y})$, $\mathbf{Y}$ be a fixed data set, and $f$ be any monotonic function, then

Figures (11)

  • Figure 1: The performance of the posterior approximator is evaluated via the variance of the corresponding marginal likelihood estimates. Top row: For the true posterior (or a perfect approximation thereof), the estimated marginal likelihood is constant for any parameter value ${\boldsymbol{\theta}\sim \pi(\boldsymbol{\theta})}$. Bottom row: For an imperfect approximate posterior, the estimated marginal likelihood varies across different parameter values. Hence, the inherent symmetry of the joint probabilistic model $p(\boldsymbol{\theta}, \mathbf{Y})$ is violated by its approximate representation. Minimizing the variance of the marginal likelihood estimates pushes the estimated marginal likelihood towards uniformity. This restores the symmetry of the unified representation, which is equivalent to improving the approximate posterior.
  • Figure 2: Experiment 1 (Gaussian Mixture Model). Performance comparison between the NPE baseline (A) and our self-consistent SC-NPE method (B). Pink star $\boldsymbol{\star}$ marks the ground-truth parameter $\boldsymbol{\theta}^*$. Both qualitative assessments (sampling) and quantitative measures (MMD; lower is better) indicate that the our SC-NPE method yields significantly better results given the same neural architecture and training budget. Across all simulation budgets, our self-consistent approximator outperforms the NPE baseline, as indexed by improved posterior fidelity (lower MMD) on 100 unseen test instances (C).
  • Figure 3: Experiment 2 (Two Moons). Our self-consistency loss yields a lower posterior error (MMD) than the baseline NPLE algorithm on the test set with equal architecture. We repeat the experiment 5 times on the same training set; plots show the median, best, and worst run.
  • Figure 4: Experiment 3 (Hes1 Expression). The baseline NPLE approximator shows deficient simulation-based calibration, as indexed by ECDF lines outside the gray 95% confidence bands (A). In contrast, our self-consistent approximator is well-calibrated (C). Samples from the posterior predictive distribution on real experimental data Silk2011 are comparable between NPLE (B) and SC (D).
  • Figure 5: Experiment 1 (Gaussian mixture model). All approximators are well-calibrated.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof