Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kyurae Kim; Yian Ma; Jacob R. Gardner

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kyurae Kim, Yian Ma, Jacob R. Gardner

TL;DR

The paper proves that black-box variational inference (BBVI) using the sticking-the-landing (STL) estimator achieves linear convergence when the variational family perfectly contains the true posterior, yielding a rate of $O(d \kappa^2 \log(1/\epsilon))$ under a triangular-scale parameterization with $\Theta(d)$ projection cost. It develops a quadratic variance framework for STL, introduces adaptive variance bounds via the Peter–Paul inequality, and analyzes both STL and the closed-form entropy (CFE) estimators, including non-asymptotic complexity results for strongly log-concave posteriors. The triangular parameterization improves computational efficiency over previous SVD-based projections, while the results extend to misspecified settings through the Fisher–Hyvärinen divergence $D_{F^4}$, yielding meaningful lower bounds and interpolation insights. The work clarifies when STL offers advantages relative to CFE and provides practical guidance for BBVI in Gaussian-like models, emphasizing the role of posterior correlation and variance reduction in achieving fast convergence. Overall, the paper delivers rigorous, non-asymptotic guarantees that connect estimator design, parameterization, and convergence behavior in BBVI.

Abstract

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $Θ(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

TL;DR

under a triangular-scale parameterization with

projection cost. It develops a quadratic variance framework for STL, introduces adaptive variance bounds via the Peter–Paul inequality, and analyzes both STL and the closed-form entropy (CFE) estimators, including non-asymptotic complexity results for strongly log-concave posteriors. The triangular parameterization improves computational efficiency over previous SVD-based projections, while the results extend to misspecified settings through the Fisher–Hyvärinen divergence

, yielding meaningful lower bounds and interpolation insights. The work clarifies when STL offers advantages relative to CFE and provides practical guidance for BBVI in Gaussian-like models, emphasizing the role of posterior correlation and variance reduction in achieving fast convergence. Overall, the paper delivers rigorous, non-asymptotic guarantees that connect estimator design, parameterization, and convergence behavior in BBVI.

Abstract

time, where

is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

Paper Structure (73 sections, 3 theorems, 161 equations, 2 tables)

This paper contains 73 sections, 3 theorems, 161 equations, 2 tables.

INTRODUCTION
Contributions
PRELIMINARIES
Notation
Variational Inference
Black-Box Variational Inference
Variational Family
Scale Parameterization
Gradient Estimators
Closed-From Entropy Estimator
Sticking-the-Landing Estimator
Quadratic Variance Condition
Interpolation Condition
Achieving "Interpolation"
Does STL "Interpolate?"
...and 58 more sections

Key Result

Proposition 1

The Euclidean projection operator onto $\Lambda_S$, $\mathrm{proj}_{\Lambda_S} : \mathbb{R}^d \times \mathbb{L}^d \to \Lambda_{S}$, is given as where $\widetilde{\mathbfit{C}}$ is the projection of $\mathbfit{C}$ such that

Theorems & Definitions (30)

Definition 1: Fisher-Hyvärinen Divergence
Definition 2: Reparameterized Family
Definition 3: Location-Scale Reparameterization Function
Definition 4: Closed-Form Entropy Estimator
Definition 5: Sticking-the-Landing Estimator; STL
Definition 6: Quadratic Variance; QV
Definition 7: Interpolation
Proposition 1
proof
Definition 8
...and 20 more

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

TL;DR

Abstract

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (30)