Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

Ryan Giordano; Martin Ingram; Tamara Broderick

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

Ryan Giordano, Martin Ingram, Tamara Broderick

TL;DR

This paper introduces deterministic ADVI (DADVI), replacing the intractable mean-field VB objective with a fixed Monte Carlo sample average to form a deterministic objective. This enables off-the-shelf second-order optimization and the computation of Linear Response (LR) covariances, improving posterior uncertainty estimates without sacrificing speed. The authors provide theoretical results showing favorable behavior of DADVI with a small fixed number of draws in certain high-dimensional structures, and they demonstrate through extensive experiments that DADVI often converges faster and yields more accurate uncertainty quantification (via LRVB) than standard ADVI. They also present practical ways to estimate Monte Carlo error and discuss the limitations of DADVI for highly expressive variational families like full-rank ADVI. Overall, the work argues that deterministic objectives and SAA can deliver robust, scalable black-box variational inference with reliable convergence diagnostics and uncertainty quantification.

Abstract

Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce "deterministic ADVI" (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the "sample average approximation" (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior covariances via linear response (LR). In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI.

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

TL;DR

Abstract

Paper Structure (29 sections, 4 theorems, 65 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 4 theorems, 65 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Setup
Our Method
Linear response covariances
Monte Carlo error estimation
Computational considerations
Considerations in high dimensions
High dimensional normals
High dimensional local variables
DADVI fails for full-rank ADVI
Related work
Experiments
Models and data
Computational cost
Posterior Accuracy
...and 14 more sections

Key Result

Proposition 1

Consider any parameter dimension index $d \in \{1, \ldots, D_{\theta}\}$, selected independently of $\mathscr{Z}$. In the quadratic model, we have $\hat{\sigma}_{d}^{-2} - \accentset{*}{\sigma}_{d}^{-2}= O_p(N^{-1/2})$ and $\hat{\mu}_{d} - \accentset{*}{\mu}_{d} = O_p(N^{-1/2})$. The constants do no

Figures (7)

Figure 1: Runtimes and model evaluation counts for the ARM models. Results are reported divided by the corresponding value for DADVI or LRVB. Numbers greater than one (shown by the black line) indicate favorable performance by DADVI or LRVB. Recall that the reported LRVB numbers include the cost of the DADVI optimization as well as the LR covariances. Most of the ARM models are relatively low-dimensional, so the LR covariances added little to the computation.
Figure 2: Runtimes and model evaluation counts for the non-ARM models. Results are reported divided by the corresponding value for DADVI or LRVB. Numbers greater than one (shown by the black line) indicate favorable performance by DADVI or LRVB. Recall that the reported LRVB numbers include the cost of the DADVI optimization as well as the LR covariances. Missing model and method combinations are marked with an X.
Figure 3: Posterior accuracy measures for the ARM models. Each point is a single named parameter in a single model. Points above the diagonal line indicate better DADVI or LRVB performance. Level curves of a 2D density estimator are shown to help visualize overplotting.
Figure 4: Posterior accuracy measures for the non-ARM models. Each point is a single named parameter in a single model. Points above the diagonal line indicate better DADVI or LRVB performance.
Figure 5: Optimization traces for the ARM models. Black dots show the termination point of each method. Dots above the horizontal black line mean that DADVI found a better ELBO. Dots to the right of the vertical black line mean that DADVI terminated sooner in terms of model evaluations.
...and 2 more figures

Theorems & Definitions (13)

Proposition 1
proof
Remark 1
Proposition 2
Definition 1
Example 1
Example 2
Theorem 3
proof : sketch
Theorem 4
...and 3 more

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

TL;DR

Abstract

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (13)