Table of Contents
Fetching ...

You Only Accept Samples Once: Fast, Self-Correcting Stochastic Variational Inference

Dominic B. Dayta

TL;DR

The paper addresses high-variance gradient estimates in stochastic VI by reducing per-iteration sampling to a single Monte Carlo draw and applying an acceptance rule to ensure ELBO improvement. It formalizes two YOASOVI formulations (naive acceptance and a Metropolis-type scheme) with tempered acceptance and early stopping, enabling robust convergence through a self-correcting update mechanism. Empirical results on Gaussian mixtures and benchmark clustering tasks show YOASOVI achieves faster convergence and better ELBO/DIC than standard MCVI and QMCVI baselines, while maintaining generality and enabling implementation in common software stacks. This approach reduces computational burden, making VI feasible on large hierarchical models and modest hardware without sacrificing accuracy.

Abstract

We introduce YOASOVI, an algorithm for performing fast, self-correcting stochastic optimization for Variational Inference (VI) on large Bayesian heirarchical models. To accomplish this, we take advantage of available information on the objective function used for stochastic VI at each iteration and replace regular Monte Carlo sampling with acceptance sampling. Rather than spend computational resources drawing and evaluating over a large sample for the gradient, we draw only one sample and accept it with probability proportional to the expected improvement in the objective. The following paper develops two versions of the algorithm: the first one based on a naive intuition, and another building up the algorithm as a Metropolis-type scheme. Empirical results based on simulations and benchmark datasets for multivariate Gaussian mixture models show that YOASOVI consistently converges faster (in clock time) and within better optimal neighborhoods than both regularized Monte Carlo and Quasi-Monte Carlo VI algorithms.

You Only Accept Samples Once: Fast, Self-Correcting Stochastic Variational Inference

TL;DR

The paper addresses high-variance gradient estimates in stochastic VI by reducing per-iteration sampling to a single Monte Carlo draw and applying an acceptance rule to ensure ELBO improvement. It formalizes two YOASOVI formulations (naive acceptance and a Metropolis-type scheme) with tempered acceptance and early stopping, enabling robust convergence through a self-correcting update mechanism. Empirical results on Gaussian mixtures and benchmark clustering tasks show YOASOVI achieves faster convergence and better ELBO/DIC than standard MCVI and QMCVI baselines, while maintaining generality and enabling implementation in common software stacks. This approach reduces computational burden, making VI feasible on large hierarchical models and modest hardware without sacrificing accuracy.

Abstract

We introduce YOASOVI, an algorithm for performing fast, self-correcting stochastic optimization for Variational Inference (VI) on large Bayesian heirarchical models. To accomplish this, we take advantage of available information on the objective function used for stochastic VI at each iteration and replace regular Monte Carlo sampling with acceptance sampling. Rather than spend computational resources drawing and evaluating over a large sample for the gradient, we draw only one sample and accept it with probability proportional to the expected improvement in the objective. The following paper develops two versions of the algorithm: the first one based on a naive intuition, and another building up the algorithm as a Metropolis-type scheme. Empirical results based on simulations and benchmark datasets for multivariate Gaussian mixture models show that YOASOVI consistently converges faster (in clock time) and within better optimal neighborhoods than both regularized Monte Carlo and Quasi-Monte Carlo VI algorithms.
Paper Structure (13 sections, 11 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 11 equations, 2 figures, 1 table, 2 algorithms.

Figures (2)

  • Figure 1: Left, probabilities of New ELBO values in the negative exponential distribution. The Negative Exponential allows us to represent our desired property that the ELBO should be higher rather than lower, with $\beta$ encoding how strict we are with this requirement. Right, resulting acceptance probabilities in the Metropolis and Naive formulations of YOASOVI. Note that the naive formulation results in narrower probabilities for the same value of $M$, in this case $M = 1.5$.
  • Figure 2: ELBO trajectories of BBVI-JS+, QMCVI, and YOASOVI within the first 50 and 100 seconds of the algorithm in two benchmark datasets from FCPS. YOASOVI separates itself well from the other two algorithms with a strongly increasing trajectory.