Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

Tom Huix; Anna Korba; Alain Durmus; Eric Moulines

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

Tom Huix, Anna Korba, Alain Durmus, Eric Moulines

TL;DR

The paper investigates theoretical guarantees for variational inference when the variational family is a fixed-variance Gaussian mixture with equal weights. It introduces the mollified relative entropy $\mathcal{F}_{\epsilon}$, linking VI to a Wasserstein gradient flow that yields an interacting-particle system updating mixture means; this provides a tractable optimization framework with Monte Carlo evaluation of the gradient. A descent lemma ensures objective decrease at each iteration under smoothness assumptions, and a nonasymptotic KL-quantization bound shows how increasing the number of mixture components reduces approximation error to the target, with rates tied to the mollification parameter $\epsilon$ and problem size. The results illuminate the theoretical properties of VI beyond Gaussian families and guide design choices for multi-modal posterior approximations, while acknowledging the limitations of fixed weights and fixed covariances and outlining directions for extending to weight optimization and variable covariances.

Abstract

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

TL;DR

, linking VI to a Wasserstein gradient flow that yields an interacting-particle system updating mixture means; this provides a tractable optimization framework with Monte Carlo evaluation of the gradient. A descent lemma ensures objective decrease at each iteration under smoothness assumptions, and a nonasymptotic KL-quantization bound shows how increasing the number of mixture components reduces approximation error to the target, with rates tied to the mollification parameter

and problem size. The results illuminate the theoretical properties of VI beyond Gaussian families and guide design choices for multi-modal posterior approximations, while acknowledging the limitations of fixed weights and fixed covariances and outlining directions for extending to weight optimization and variable covariances.

Abstract

Paper Structure (21 sections, 13 theorems, 106 equations, 3 figures)

This paper contains 21 sections, 13 theorems, 106 equations, 3 figures.

Introduction
The mollified relative entropy
Algorithm
Non-smoothness of the KL
Optimization Guarantees
Approximation Guarantees
Related work
Conclusion
Particle implementation of the gradient flow
Mixture of Gaussians optimization
Lemmas for the proof of \ref{['th:kl_quantization']}
Proof of \ref{['prop:decreasing_functional']}
Wasserstein Hessians of relative entropies
Proof of \ref{['prop:hessian_kl']}
Hessian of the mollified relative entropy
...and 6 more sections

Key Result

Proposition 2

villani2021topics. Assume that $\mu^{\star}$ has a density $\mu^{\star}\propto e^{-V}$ where the potential $V:X \to \mathbb R$ is $C^2(\mathbb R^d)$. The Hessian of $\mathop{\mathrm{KL}}\nolimits(\cdot|\mu^{\star})$ at $\mu$ is given, for any $\psi \in C_c^{\infty}(\mathbb R^d)$, by: where $\mathrm{H}_V$ is the Hessian of $V$.

Figures (3)

Figure 1: Second moment along Wasserstein gradient descent iterations.
Figure 2: Illustration of the rate of $\frac{1}{L} \sum_{l=1}^{L} \|\nabla \mathcal{F}_{\epsilon}'(\mu_l)\|^2_{L^2(\mu_l)}$ derived in Corollary \ref{['cor:average_gradient']}
Figure 3: Illustration of the rates of \ref{['th:kl_quantization']}, where $\nu_n=\mathop{\mathrm{argmin}}\limits_{\nu \in \mathcal{C}_n}\mathop{\mathrm{KL}}\nolimits(\nu| \mu^{\star})$ is approximated by $\tilde{\nu}_n$.

Theorems & Definitions (26)

Remark 1
Proposition 2
Proposition 3
Proposition 4
proof : Proof of \ref{['prop:decreasing_functional']}
Corollary 5
Remark 6
Theorem 7
proof : Proof of \ref{['th:kl_quantization']}
Lemma 8
...and 16 more

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

TL;DR

Abstract

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (26)