Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?
Kyurae Kim, Yian Ma, Jacob R. Gardner
TL;DR
The paper proves that black-box variational inference (BBVI) using the sticking-the-landing (STL) estimator achieves linear convergence when the variational family perfectly contains the true posterior, yielding a rate of $O(d \kappa^2 \log(1/\epsilon))$ under a triangular-scale parameterization with $\Theta(d)$ projection cost. It develops a quadratic variance framework for STL, introduces adaptive variance bounds via the Peter–Paul inequality, and analyzes both STL and the closed-form entropy (CFE) estimators, including non-asymptotic complexity results for strongly log-concave posteriors. The triangular parameterization improves computational efficiency over previous SVD-based projections, while the results extend to misspecified settings through the Fisher–Hyvärinen divergence $D_{F^4}$, yielding meaningful lower bounds and interpolation insights. The work clarifies when STL offers advantages relative to CFE and provides practical guidance for BBVI in Gaussian-like models, emphasizing the role of posterior correlation and variance reduction in achieving fast convergence. Overall, the paper delivers rigorous, non-asymptotic guarantees that connect estimator design, parameterization, and convergence behavior in BBVI.
Abstract
We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $Θ(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.
