Variational Inference: A Review for Statisticians
David M. Blei, Alp Kucukelbir, Jon D. McAuliffe
TL;DR
This paper surveys variational inference (VI) as a scalable alternative to sampling-based Bayesian computation, recasting posterior approximation as optimization over a variational family and introducing the evidence lower bound (ELBO) as a central objective. It develops the mean-field variational framework, derives coordinate ascent updates, and demonstrates a complete Bayesian Gaussian mixture example to illustrate practical updates and convergence behavior. The discussion extends VI to exponential-family models, conjugacy, and stochastic variational inference (SVI) for large datasets, and covers nonconjugate and black-box approaches, emphasizing applications, theory, and open problems. Collectively, the work clarifies VI's foundations, connections to classical inference, and its potential and limitations for scalable Bayesian computation across diverse domains.
Abstract
One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
