Table of Contents
Fetching ...

Variational Inference: A Review for Statisticians

David M. Blei, Alp Kucukelbir, Jon D. McAuliffe

TL;DR

This paper surveys variational inference (VI) as a scalable alternative to sampling-based Bayesian computation, recasting posterior approximation as optimization over a variational family and introducing the evidence lower bound (ELBO) as a central objective. It develops the mean-field variational framework, derives coordinate ascent updates, and demonstrates a complete Bayesian Gaussian mixture example to illustrate practical updates and convergence behavior. The discussion extends VI to exponential-family models, conjugacy, and stochastic variational inference (SVI) for large datasets, and covers nonconjugate and black-box approaches, emphasizing applications, theory, and open problems. Collectively, the work clarifies VI's foundations, connections to classical inference, and its potential and limitations for scalable Bayesian computation across diverse domains.

Abstract

One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.

Variational Inference: A Review for Statisticians

TL;DR

This paper surveys variational inference (VI) as a scalable alternative to sampling-based Bayesian computation, recasting posterior approximation as optimization over a variational family and introducing the evidence lower bound (ELBO) as a central objective. It develops the mean-field variational framework, derives coordinate ascent updates, and demonstrates a complete Bayesian Gaussian mixture example to illustrate practical updates and convergence behavior. The discussion extends VI to exponential-family models, conjugacy, and stochastic variational inference (SVI) for large datasets, and covers nonconjugate and black-box approaches, emphasizing applications, theory, and open problems. Collectively, the work clarifies VI's foundations, connections to classical inference, and its potential and limitations for scalable Bayesian computation across diverse domains.

Abstract

One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.

Paper Structure

This paper contains 24 sections, 73 equations, 11 figures, 3 algorithms.

Figures (11)

  • Figure 1: Visualizing the mean-field approximation to a two-dimensional Gaussian posterior. The ellipses show the effect of mean-field factorization. (The ellipses are $2 \sigma$ contours of the Gaussian distributions.)
  • Figure 3: Different initializations may lead to find different local optima of the .
  • Figure 4: for a Gaussian mixture model
  • Figure 5: A simulation study of a two dimensional Gaussian mixture model. The ellipses are $2\sigma$ contours of the variational approximating factors.
  • Figure 6: Red, green, and blue channel image histograms for two images from the imageclef dataset. The top image lacks blue hues, which is reflected in its blue channel histogram. The bottom image has a few dominant shades of blue and green, as seen in the peaks of its histogram.
  • ...and 6 more figures