Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians
Tom Huix, Anna Korba, Alain Durmus, Eric Moulines
TL;DR
The paper investigates theoretical guarantees for variational inference when the variational family is a fixed-variance Gaussian mixture with equal weights. It introduces the mollified relative entropy $\mathcal{F}_{\epsilon}$, linking VI to a Wasserstein gradient flow that yields an interacting-particle system updating mixture means; this provides a tractable optimization framework with Monte Carlo evaluation of the gradient. A descent lemma ensures objective decrease at each iteration under smoothness assumptions, and a nonasymptotic KL-quantization bound shows how increasing the number of mixture components reduces approximation error to the target, with rates tied to the mollification parameter $\epsilon$ and problem size. The results illuminate the theoretical properties of VI beyond Gaussian families and guide design choices for multi-modal posterior approximations, while acknowledging the limitations of fixed weights and fixed covariances and outlining directions for extending to weight optimization and variable covariances.
Abstract
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.
