Variational Inference with Mixtures of Isotropic Gaussians
Marguerite Petit-Talamon, Marc Lambert, Anna Korba
TL;DR
This work introduces a variational-inference framework based on mixtures of isotropic Gaussians with uniform weights to approximate multimodal posteriors efficiently. It develops two geometric optimization schemes on the isotropic-Gaussian manifold—Bures-Wasserstein gradient descent and entropic mirror descent—then extends them to a multi-component MIG family with joint mean and variance updates. The paper provides explicit gradient formulas, a JKO-inspired scheme for mixtures, and two practical update rules (IBW and MD) that preserve variance positivity and enable scalable inference. Empirical results on synthetic mixtures and Bayesian posterior tasks demonstrate that increasing the number of components improves multimodal coverage with modest computational overhead, outperforming several baselines in terms of accuracy and efficiency. The approach offers a compelling balance between expressivity and tractability for variational inference in multimodal settings, with directions for theoretical guarantees and broader applicability.
Abstract
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments.
