Provably Scalable Black-Box Variational Inference with Structured Variational Families

Joohwan Ko; Kyurae Kim; Woo Chang Kim; Jacob R. Gardner

Provably Scalable Black-Box Variational Inference with Structured Variational Families

Joohwan Ko, Kyurae Kim, Woo Chang Kim, Jacob R. Gardner

TL;DR

This work investigates the scalability limitations of BBVI when using full-rank variational covariances and demonstrates that structured location-scale variational families can dramatically improve computational efficiency. By focusing on triangular/bordered block-diagonal scale structures and employing proximal SGD, the authors prove $\mathcal{O}(N)$ iteration complexity for finite-sum hierarchical models, and validate these results with large-scale experiments on models with local variables. A key contribution is the formalization of hierarchical branched distributions and the analysis showing how gradient variance depends on an effective dimensionality $d^*$, which can be reduced by structure. The findings offer a principled trade-off between posterior expressiveness and computational tractability, with practical implications for scalable Bayesian inference in complex hierarchical settings.

Abstract

Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent computational complexity results for BBVI have established that full-rank variational families scale poorly with the dimensionality of the problem compared to e.g. mean-field families. This is particularly critical to hierarchical Bayesian models with local variables; their dimensionality increases with the size of the datasets. Consequently, one gets an iteration complexity with an explicit $\mathcal{O}(N^2)$ dependence on the dataset size $N$. In this paper, we explore a theoretical middle ground between mean-field variational families and full-rank families: structured variational families. We rigorously prove that certain scale matrix structures can achieve a better iteration complexity of $\mathcal{O}\left(N\right)$, implying better scaling with respect to $N$. We empirically verify our theoretical results on large-scale hierarchical models.

Provably Scalable Black-Box Variational Inference with Structured Variational Families

TL;DR

iteration complexity for finite-sum hierarchical models, and validate these results with large-scale experiments on models with local variables. A key contribution is the formalization of hierarchical branched distributions and the analysis showing how gradient variance depends on an effective dimensionality

, which can be reduced by structure. The findings offer a principled trade-off between posterior expressiveness and computational tractability, with practical implications for scalable Bayesian inference in complex hierarchical settings.

Abstract

dependence on the dataset size

. In this paper, we explore a theoretical middle ground between mean-field variational families and full-rank families: structured variational families. We rigorously prove that certain scale matrix structures can achieve a better iteration complexity of

, implying better scaling with respect to

. We empirically verify our theoretical results on large-scale hierarchical models.

Paper Structure (71 sections, 7 theorems, 141 equations, 13 figures, 2 tables)

This paper contains 71 sections, 7 theorems, 141 equations, 13 figures, 2 tables.

Introduction
Preliminaries
Notation
Variational Inference
Evidence Lower Bound
Finite-Sum Likelihood
Variational Family
Scale Parameterization
Triangular Scale
Stochastic Proximal Gradient Descent
Proximal SGD
Proximal SGD in BBVI
Theoretical Analysis
Fundamental Limits of Being Full-Rank
Iteration Complexity of Full-Rank Families
...and 56 more sections

Key Result

Theorem 1

Let $\ell$ be $\mu$-strongly convex and $L$-smooth. Then, the iteration complexity of being $\epsilon$-close to the global minimizer with proximal SGD BBVI is where $\kappa = L / \mu$, $\Delta_0 = {\left\lVert \vlambda_0 - \vlambda^* \right\rVert}_2$ is the distance between the initial point $\vlambda_0$ and the global optimum $\vlambda^* = \mathop{\mathrm{arg\,min}}\limits_{\vlambda \in \Lambda}

Figures (13)

Figure 1: Visualization of $\mC$ under the proposed structure. The colored entries are non-zero, while the white entries are filled with zeros.
Figure 2: Number of iterations $T$ required to obtain $\epsilon$ accuracy of variational families for a given stepsize $\gamma$.structured behaves similarly to mean-field, while full-rank requires significantly more number of iterations, which also scales worse with respect to the number of datapoints $n$.
Figure 3: Scaling of variational families with respect to the number of datapoints $n$. full-rank exhibits a worst scaling than structured and mean-field.
Figure 4: ELBO at $T = 5 \times 10^4$ versus the optimizer stepsize ($\gamma$) on the considered problems with varying dataset sizes. The solid lines are the median over 8 independent replications, while the colored bands mark the 80% empirical percentiles.
Figure 5: ELBO versus stepsize on rpoisson-small The solid lines are the median, while the shaded regions are the 80% quantiles computed from 4 independent replications. Notice that the performance gap between full-rank and structured becomes narrower as we reduce the stepsize.
...and 8 more figures

Theorems & Definitions (29)

Definition 1: Location-Scale Family
Theorem 1: domke_provable_2023kim_convergence_2023
Corollary : Informal
Remark 1: Sample Complexity
Remark 2: Are fewer parameters obviously better?
Remark 3: Where does $d$ come from?
Remark 4
Remark 5
Remark 6
Corollary 1
...and 19 more

Provably Scalable Black-Box Variational Inference with Structured Variational Families

TL;DR

Abstract

Provably Scalable Black-Box Variational Inference with Structured Variational Families

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (29)