Table of Contents
Fetching ...

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Tim Salimans, Diederik P. Kingma, Max Welling

TL;DR

The paper addresses intractable Bayesian posteriors by blending variational inference with MCMC through auxiliary variables, enabling a flexible posterior approximation that can be trained via stochastic gradients. It introduces the auxiliary variational lower bound $\mathcal{L}_{aux}$ and develops Markov Chain Variational Inference (MCVI) and Hamiltonian Variational Inference (HVI), integrating MCMC steps such as Gibbs sampling, over-relaxation, and Hamiltonian dynamics into the variational framework. It analyzes practical chain specifications, including detailed balance, annealed inference, multiple iterates, and sequential MCVI, showing how these choices tighten bounds and improve convergence. The approach yields tighter posteriors and faster optimization in experiments spanning simple Gaussian targets to deep generative models on MNIST, highlighting its potential to bridge the speed of VI with the accuracy of MCMC for scalable Bayesian inference.

Abstract

Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

TL;DR

The paper addresses intractable Bayesian posteriors by blending variational inference with MCMC through auxiliary variables, enabling a flexible posterior approximation that can be trained via stochastic gradients. It introduces the auxiliary variational lower bound and develops Markov Chain Variational Inference (MCVI) and Hamiltonian Variational Inference (HVI), integrating MCMC steps such as Gibbs sampling, over-relaxation, and Hamiltonian dynamics into the variational framework. It analyzes practical chain specifications, including detailed balance, annealed inference, multiple iterates, and sequential MCVI, showing how these choices tighten bounds and improve convergence. The approach yields tighter posteriors and faster optimization in experiments spanning simple Gaussian targets to deep generative models on MNIST, highlighting its potential to bridge the speed of VI with the accuracy of MCMC for scalable Bayesian inference.

Abstract

Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.

Paper Structure

This paper contains 14 sections, 14 equations, 3 figures, 1 table, 4 algorithms.

Figures (3)

  • Figure 1: The log marginal likelihood lower bound for a bivariate Gaussian target and an MCMC variational approximaton, using Gibbs sampling or Adler's overrelaxation.
  • Figure 2: Approximate posteriors for a varying number of leapfrog steps. Exact posterior at bottom right.
  • Figure 3: R-squared accuracy measure salimans2013fixed for approximate posteriors using a varying number of leapfrog steps.