Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference
Marvin Schmitt, Desi R. Ivanova, Daniel Habermann, Ullrich Köthe, Paul-Christian Bürkner, Stefan T. Radev
TL;DR
This work addresses data efficiency in amortized Bayesian inference by exploiting a symmetry of the joint model $p(\boldsymbol{\theta}, \mathbf{Y})$ under Bayes' rule. The authors invert Bayes' theorem to monitor the marginal likelihood and introduce a self-consistency loss that penalizes variance of $\log(p(\boldsymbol{\theta}) p(\mathbf{Y}|\boldsymbol{\theta}) / q_{\boldsymbol{\phi}}(\boldsymbol{\theta}|\mathbf{Y}))$ across parameter samples, thereby improving neural posterior (NPE) and neural posterior-likelihood (NPLE) estimators, particularly in low-data regimes. The method integrates with both explicit and learned likelihoods (via $q_{\boldsymbol{\eta}}(\mathbf{Y}|\boldsymbol{\theta})$), and empirical results across Gaussian mixtures, two moons, Hes1 biology, source localization, and high-dimensional time series demonstrate sharper marginal-likelihood estimates, better calibrated posteriors, and more accurate posterior samples. The approach offers data-efficient augmentation to SBI/ABI pipelines and can extend to sequential SBI and other density estimators, albeit with added training cost and reliance on tractable prior densities.
Abstract
We propose a method to improve the efficiency and accuracy of amortized Bayesian inference by leveraging universal symmetries in the joint probabilistic model of parameters and data. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, errors in approximate inference lead to undesirable variance in the marginal likelihood estimates across different parameter values. We penalize violations of this symmetry with a \textit{self-consistency loss} which significantly improves the quality of approximate inference in low data regimes and can be used to augment the training of popular neural density estimators. We apply our method to a number of synthetic problems and realistic scientific models, discovering notable advantages in the context of both neural posterior and likelihood approximation.
