ELBOing Stein: Variational Bayes with Stein Mixture Inference
Ola Rønning, Eric Nalisnick, Christophe Ley, Padhraic Smyth, Thomas Hamelryck
TL;DR
This paper tackles variance collapse in Stein variational methods by introducing Stein Mixture Inference (SMI), which represents the variational posterior as a uniform mixture of $m$ guides parameterized by particles. SMI optimizes a mixture ELBO $\, ext{ELBO}_{SMI}$, augmented by a diversification term that enables particle spread and remains an ELBO when the entropic coefficient is set to $1$. By embedding NSVGD within this mixture framework, the authors derive a tractable kernelized gradient that produces attractive forces toward high-likelihood regions while maintaining repulsive forces to prevent collapse, enabling efficient uncertainty quantification with fewer particles. Empirically, SMI mitigates variance collapse in small-to-moderate Bayesian neural networks, yielding improved calibrated uncertainty on synthetic tasks, UCI benchmarks, and MNIST classification, and demonstrates better particle efficiency than SVGD. The work establishes a principled, ELBO-based pathway to variational Bayes with mixtures, combining the strengths of density-based and sample-based particle methods for tall-and-wide data scenarios.
Abstract
Stein variational gradient descent (SVGD) [Liu and Wang, 2016] performs approximate Bayesian inference by representing the posterior with a set of particles. However, SVGD suffers from variance collapse, i.e. poor predictions due to underestimating uncertainty [Ba et al., 2021], even for moderately-dimensional models such as small Bayesian neural networks (BNNs). To address this issue, we generalize SVGD by letting each particle parameterize a component distribution in a mixture model. Our method, Stein Mixture Inference (SMI), optimizes a lower bound to the evidence (ELBO) and introduces user-specified guides parameterized by particles. SMI extends the Nonlinear SVGD framework [Wang and Liu, 2019] to the case of variational Bayes. SMI effectively avoids variance collapse, judging by a previously described test developed for this purpose, and performs well on standard data sets. In addition, SMI requires considerably fewer particles than SVGD to accurately estimate uncertainty for small BNNs. The synergistic combination of NSVGD, ELBO optimization and user-specified guides establishes a promising approach towards variational Bayesian inference in the case of tall and wide data.
