Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold
Hoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, Marcelo Hartmann, Arto Klami
TL;DR
This work tackles Gaussian variational inference in the Bures–Wasserstein geometry by addressing high-variance forward BW-gradient estimates needed in forward–backward Euler optimization. It introduces SVRGVI, a variance-reduced estimator based on a Stein-style control variate $Z_k=\Sigma_k^{-1}(X_k-m_k)$ with adaptive coefficient $c$, which reduces variance without extra sampling and with only $O(d^2)$ additional cost per iteration from reusing the Cholesky factor. The authors prove variance reduction in a neighborhood of the optimal solution and under strong convexity under a trace condition, and show that variance reduction improves convergence bounds; they also demonstrate substantial empirical gains over BWGD and SGVI across Gaussian, Student’s t, and Bayesian logistic regression targets. The approach preserves the beneficial BW geometry properties while delivering orders-of-magnitude improvements in accuracy and stability, making BW-geometry Gaussian VI practical for high-dimensional Bayesian inference.
Abstract
Optimization in the Bures-Wasserstein space has been gaining popularity in the machine learning community since it draws connections between variational inference and Wasserstein gradient flows. The variational inference objective function of Kullback-Leibler divergence can be written as the sum of the negative entropy and the potential energy, making forward-backward Euler the method of choice. Notably, the backward step admits a closed-form solution in this case, facilitating the practicality of the scheme. However, the forward step is not exact since the Bures-Wasserstein gradient of the potential energy involves "intractable" expectations. Recent approaches propose using the Monte Carlo method -- in practice a single-sample estimator -- to approximate these terms, resulting in high variance and poor performance. We propose a novel variance-reduced estimator based on the principle of control variates. We theoretically show that this estimator has a smaller variance than the Monte-Carlo estimator in scenarios of interest. We also prove that variance reduction helps improve the optimization bounds of the current analysis. We demonstrate that the proposed estimator gains order-of-magnitude improvements over the previous Bures-Wasserstein methods.
