Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

Manuel Arnese; Daniel Lacker

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

Manuel Arnese, Daniel Lacker

TL;DR

This work provides the first general convergence theory for Coordinate Ascent Variational Inference (CAVI) in MFVI when the target is log-concave, showing that the MFVI objective is geodesically convex in Wasserstein space and that CAVI acts as a block coordinate descent on that geometry. Under mild integrability, the iterates converge to a minimizer; a strictly convex potential yields a unique minimizer and weak convergence of the iterates. With a Lipschitz gradient for the log-density, the paper proves a linear convergence rate, and with strong convexity, exponential convergence, both characterized in terms of the Wasserstein diameter $R$ and problem constants. The Gaussian special case exhibits a dimension-free exponential rate, illustrating the sharpness of the theory in practice. Overall, the results bridge optimal-transport geometry and classical convex optimization to deliver practical convergence guarantees for MFVI via CAVI.

Abstract

Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure $ρ$. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities $ρ$. If additionally $\log ρ$ has Lipschitz gradient, we find a linear rate of convergence, and if also $ρ$ is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when $ρ$ is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

TL;DR

and problem constants. The Gaussian special case exhibits a dimension-free exponential rate, illustrating the sharpness of the theory in practice. Overall, the results bridge optimal-transport geometry and classical convex optimization to deliver practical convergence guarantees for MFVI via CAVI.

Abstract

Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure

. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities

. If additionally

has Lipschitz gradient, we find a linear rate of convergence, and if also

is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when

is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.

Paper Structure (19 sections, 27 theorems, 118 equations)

This paper contains 19 sections, 27 theorems, 118 equations.

Introduction
Setting and main results
Wasserstein Geometry of MFVI
CAVI as BCD in Wasserstein Space
Applications to Bayesian Linear Regression
The Gaussian case
Related Literature
Convexity and calculus in Wasserstein space
Geodesic convexity
Subdifferential calculus
The CAVI algorithm
Iterates are well defined
Uniform moment bounds
Qualitative Convergence
The Lipschitz gradient case
...and 4 more sections

Key Result

Theorem 1.1

Let $\psi : {\mathbb R}^k \to {\mathbb R}$ be a convex function such that $\rho(x) \propto e^{-\psi(x)}$ defines a probability density, and assume there exist finite constants $c > 0$ and $p \ge 2$ such that where $\nabla \psi$ is the weak gradient. Let $\mu_0 \in {\mathcal{P}}^{\otimes d}({\mathbb R}^{k_i})$ have finite $p$moment, and define the CAVI iterates $\mu_n=\bigotimes_{i=1}^d \mu_n^i$ a

Theorems & Definitions (51)

Theorem 1.1
Remark 1.2
Proposition 1.3
Proposition 1.4
Proposition 2.1
Theorem 2.2
Lemma 2.3
proof
Definition 2.4
Theorem 2.5
...and 41 more

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

TL;DR

Abstract

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (51)