Wasserstein Gradient Flow over Variational Parameter Space for Variational Inference
Dai Hai Nguyen, Tetsuya Sakurai, Hiroshi Mamitsuka
TL;DR
This paper addresses variational inference by redefining the optimization domain from latent variables to the variational-parameter space and solving it with Wasserstein gradient flows (WGF). By showing that BBVI and NGVI are special cases of WGF, the authors introduce GFlowVI and NGFlowVI, which represent the variational posterior as a mixture of components updated via preconditioned gradient flows and mirror-descent weight updates. They establish continuous-time descent properties and provide discrete-time, particle-based algorithms, including a simple MD-based mechanism to adapt component weights, and a practical fix for negative Hessians using constrained mirror maps. Empirical results on synthetic distributions and Bayesian neural networks demonstrate faster convergence, improved approximation of multimodal posteriors, and favorable computational scaling compared with kernel-based methods, highlighting the method's flexibility and potential for handling complex variational families.
Abstract
Variational inference (VI) can be cast as an optimization problem in which the variational parameters are tuned to closely align a variational distribution with the true posterior. The optimization task can be approached through vanilla gradient descent in black-box VI or natural-gradient descent in natural-gradient VI. In this work, we reframe VI as the optimization of an objective that concerns probability distributions defined over a \textit{variational parameter space}. Subsequently, we propose Wasserstein gradient descent for tackling this optimization problem. Notably, the optimization techniques, namely black-box VI and natural-gradient VI, can be reinterpreted as specific instances of the proposed Wasserstein gradient descent. To enhance the efficiency of optimization, we develop practical methods for numerically solving the discrete gradient flows. We validate the effectiveness of the proposed methods through empirical experiments on a synthetic dataset, supplemented by theoretical analyses.
