Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box
Ryan Giordano, Martin Ingram, Tamara Broderick
TL;DR
This paper introduces deterministic ADVI (DADVI), replacing the intractable mean-field VB objective with a fixed Monte Carlo sample average to form a deterministic objective. This enables off-the-shelf second-order optimization and the computation of Linear Response (LR) covariances, improving posterior uncertainty estimates without sacrificing speed. The authors provide theoretical results showing favorable behavior of DADVI with a small fixed number of draws in certain high-dimensional structures, and they demonstrate through extensive experiments that DADVI often converges faster and yields more accurate uncertainty quantification (via LRVB) than standard ADVI. They also present practical ways to estimate Monte Carlo error and discuss the limitations of DADVI for highly expressive variational families like full-rank ADVI. Overall, the work argues that deterministic objectives and SAA can deliver robust, scalable black-box variational inference with reliable convergence diagnostics and uncertainty quantification.
Abstract
Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce "deterministic ADVI" (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the "sample average approximation" (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior covariances via linear response (LR). In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI.
