Advances in Variational Inference
Cheng Zhang, Judith Butepage, Hedvig Kjellstrom, Stephan Mandt
TL;DR
This work surveys variational inference across four interconnected axes: scalability, general applicability beyond conjugate models, accuracy of approximations, and amortized inference for rapid, data-conditional predictions. It foregrounds stochastic variational inference and black-box/ reparameterization-based methods to handle large datasets and non-conjugate models, while also detailing structured and alternative-divergence approaches to improve posterior fidelity. The paper highlights advances in VAEs, normalizing flows, Stein-discrepancy methods, and hierarchical/temporal variational forms as major pillars enabling Bayesian deep learning and scalable probabilistic modeling. By outlining practical tricks, theoretical developments, and probabilistic programming tools, it argues that VI is central to modern uncertainty-aware AI with broad applicability to deep generative modeling, time-series analysis, and large-scale inference. The review concludes with a roadmap for automatic VI and future research directions in theory, automation, and integration with deep learning systems.
Abstract
Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully used in various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.
