Table of Contents
Fetching ...

SoftCVI: Contrastive variational inference with self-generated soft labels

Daniel Ward, Mark Beaumont, Matteo Fasiolo

TL;DR

SoftCVI addresses posterior inference with unnormalized densities by reframing variational optimization as contrastive classification over samples drawn from a proposal. It learns a variational posterior $q_{\bm{\phi}}(\bm{\theta})$ by generating ground-truth soft labels from the unnormalized density via a softmax over log-density ratios with a tempered negative $p^-(\bm{\theta}) = (\pi(\bm{\theta}))^{\alpha}$, and uses the proposal equal to the variational distribution. The framework yields stable, mass-covering objectives and exhibits zero-variance gradients when the approximation matches the true posterior, with empirical gains over traditional VI methods across tasks including simple Gaussians and normalizing flows. Overall, SoftCVI bridges variational inference and contrastive learning, offering practical benefits for robust uncertainty quantification in complex Bayesian models.

Abstract

Estimating a distribution given access to its unnormalized density is pivotal in Bayesian inference, where the posterior is generally known only up to an unknown normalizing constant. Variational inference and Markov chain Monte Carlo methods are the predominant tools for this task; however, both are often challenging to apply reliably, particularly when the posterior has complex geometry. Here, we introduce Soft Contrastive Variational Inference (SoftCVI), which allows a family of variational objectives to be derived through a contrastive estimation framework. The approach parameterizes a classifier in terms of a variational distribution, reframing the inference task as a contrastive estimation problem aiming to identify a single true posterior sample among a set of samples. Despite this framing, we do not require positive or negative samples, but rather learn by sampling the variational distribution and computing ground truth soft classification labels from the unnormalized posterior itself. The objectives have zero variance gradient when the variational approximation is exact, without the need for specialized gradient estimators. We empirically investigate the performance on a variety of Bayesian inference tasks, using both simple (e.g. normal) and expressive (normalizing flow) variational distributions. We find that SoftCVI can be used to form objectives which are stable to train and mass-covering, frequently outperforming inference with other variational approaches.

SoftCVI: Contrastive variational inference with self-generated soft labels

TL;DR

SoftCVI addresses posterior inference with unnormalized densities by reframing variational optimization as contrastive classification over samples drawn from a proposal. It learns a variational posterior by generating ground-truth soft labels from the unnormalized density via a softmax over log-density ratios with a tempered negative , and uses the proposal equal to the variational distribution. The framework yields stable, mass-covering objectives and exhibits zero-variance gradients when the approximation matches the true posterior, with empirical gains over traditional VI methods across tasks including simple Gaussians and normalizing flows. Overall, SoftCVI bridges variational inference and contrastive learning, offering practical benefits for robust uncertainty quantification in complex Bayesian models.

Abstract

Estimating a distribution given access to its unnormalized density is pivotal in Bayesian inference, where the posterior is generally known only up to an unknown normalizing constant. Variational inference and Markov chain Monte Carlo methods are the predominant tools for this task; however, both are often challenging to apply reliably, particularly when the posterior has complex geometry. Here, we introduce Soft Contrastive Variational Inference (SoftCVI), which allows a family of variational objectives to be derived through a contrastive estimation framework. The approach parameterizes a classifier in terms of a variational distribution, reframing the inference task as a contrastive estimation problem aiming to identify a single true posterior sample among a set of samples. Despite this framing, we do not require positive or negative samples, but rather learn by sampling the variational distribution and computing ground truth soft classification labels from the unnormalized posterior itself. The objectives have zero variance gradient when the variational approximation is exact, without the need for specialized gradient estimators. We empirically investigate the performance on a variety of Bayesian inference tasks, using both simple (e.g. normal) and expressive (normalizing flow) variational distributions. We find that SoftCVI can be used to form objectives which are stable to train and mass-covering, frequently outperforming inference with other variational approaches.
Paper Structure (26 sections, 38 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 38 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: The posterior performance metrics (see \ref{['sec:metrics']}). a) The nominal coverage frequency against the average difference between the nominal and actual coverage frequency. Well-calibrated methods follow the black dotted line ($y=0$), whereas conservative methods fall above, and overconfident methods below. b) The average probability of the reference posterior samples in the approximate posterior. c) The accuracy of the approximate posterior mean, calculated as the negative $L^2$-norm between the mean of the standardized reference and approximate posterior samples.
  • Figure 2: A 2-dimensional posterior marginal for a single run of the task, with the reference posterior samples shown in green.
  • Figure 3: The signal, noise and signal-to-noise ratio of the objective gradients on a toy normal task of varying dimensionality. When $d=50$, the gradient properties are computed parameter-wise and averaged. The vertical dashed line shows the true parameter values from the closed form posterior solution.
  • Figure 4: The manifolds learned by variational autoencoders on the MNIST dataset, trained using either the or . To enable training of the model parameters, the objective was modified by adding the model component of the , $-\frac{1}{K}\sum_{i=1}^K \log p_{\bm{\psi}}(\bm{\theta}_k, {\bm{x}_\text{obs}})$. In all cases, the objectives were trained for 100,000 steps with a batch size of 1, and $K=8$.
  • Figure 5: The means and 95% prediction intervals for the components of a Bayesian neural additive model for each method. The true underlying components are shown with the dotted black lines. We restrict to the first three dimensions, ignoring the nuisance variables.
  • ...and 5 more figures