Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

Peiwen Jiang; Takuo Matsubara; Minh-Ngoc Tran

Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

Peiwen Jiang, Takuo Matsubara, Minh-Ngoc Tran

TL;DR

The paper tackles the instability of IW-ELBO optimization in Euclidean space by recasting variational inference for Gaussian families on the Bures-Wasserstein (BW) geometry. It derives the Wasserstein gradient of IW-ELBO and its BW projection, proving that the gradient SNR scales as $\Omega(\sqrt{K})$, enabling stable optimization as $K$ grows, and extends the analysis to VR-IWAE with similar guarantees. A practical BW-IW-ELBO algorithm is developed with forward-Euler updates on the mean and covariance, and its mass-covering properties are demonstrated on challenging multimodal targets and a Bayesian logistic regression task, outperforming baselines. The work further extends to VR-IWAE, showing analogous gradient structure and optimization dynamics, confirming the practical value of transport-geometric methods for importance-weighted VI. Overall, the BW framework yields more robust tail coverage and faster convergence for Gaussian VI in complex posterior landscapes, with potential extensions to broader bounds and geometries.

Abstract

The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour. However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI. A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $Ω(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational Rényi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.

Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

TL;DR

, enabling stable optimization as

grows, and extends the analysis to VR-IWAE with similar guarantees. A practical BW-IW-ELBO algorithm is developed with forward-Euler updates on the mean and covariance, and its mass-covering properties are demonstrated on challenging multimodal targets and a Bayesian logistic regression task, outperforming baselines. The work further extends to VR-IWAE, showing analogous gradient structure and optimization dynamics, confirming the practical value of transport-geometric methods for importance-weighted VI. Overall, the BW framework yields more robust tail coverage and faster convergence for Gaussian VI in complex posterior landscapes, with potential extensions to broader bounds and geometries.

Abstract

increases, we prove that the SNR of the Wasserstein gradient scales favourably as

, ensuring optimisation efficiency even for large

. We further extend this geometric analysis to the Variational Rényi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.

Paper Structure (29 sections, 5 theorems, 69 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 5 theorems, 69 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Wasserstein Gradient
Signal-to-Noise Ratio
Bures-Wasserstein IW-ELBO VI
Extension to Generalised Bounds
Preliminaries
Setup and Notation
Variational Inference
Importance-Weighted Evidence Lower Bound
Wasserstein Geometry
Bures-Wasserstein Geometry
Bures-Wasserstein Importance-Weighted Evidence Lower Bound
Wasserstein Gradient of IW-ELBO
Signal-to-Noise Ratio of Wasserstein Gradient of IW-ELBO
Optimisation of the IW-ELBO on the BW Space
...and 14 more sections

Key Result

Proposition 1

Suppose standard conditions that allow interchanging derivative and expectation, the Wasserstein gradient of $\mathcal{F}_n(q_n)$ at $q_n = q$ is given by where we define $w(z) := p(x, z) / q(z)$ for notational convenience.

Figures (6)

Figure 1: Evolution of the variational approximation for a four-peaked eggbox target. The BW-IW-ELBO is the only method that successfully learns a single Gaussian to cover all four modes of the true target distribution, demonstrating superior mass-covering behavior.
Figure 2: SNR of the Wasserstein gradient estimator v.s. $K$ for all dimensions. The straight line is the fitted regression line.
Figure 3: Number of iterations until convergence for various $K$. The vertical bars show the variation across different runs
Figure 4: IW-ELBO bound estimates over iterations
Figure 5: Pairwise and Marginal Posterior Comparisons
...and 1 more figures

Theorems & Definitions (9)

Proposition 1
Theorem 1
Proposition 2
proof
Proposition 3
Theorem 2
proof : Proof of Proposition \ref{['pro: proposition1']}
proof : Proof of \ref{['the: Wass grad SNR']}
proof : Proof of \ref{['pro: proposition VR-IWAE']}

Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

TL;DR

Abstract

Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)