Table of Contents
Fetching ...

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

Manuel Arnese, Daniel Lacker

TL;DR

This work provides the first general convergence theory for Coordinate Ascent Variational Inference (CAVI) in MFVI when the target is log-concave, showing that the MFVI objective is geodesically convex in Wasserstein space and that CAVI acts as a block coordinate descent on that geometry. Under mild integrability, the iterates converge to a minimizer; a strictly convex potential yields a unique minimizer and weak convergence of the iterates. With a Lipschitz gradient for the log-density, the paper proves a linear convergence rate, and with strong convexity, exponential convergence, both characterized in terms of the Wasserstein diameter $R$ and problem constants. The Gaussian special case exhibits a dimension-free exponential rate, illustrating the sharpness of the theory in practice. Overall, the results bridge optimal-transport geometry and classical convex optimization to deliver practical convergence guarantees for MFVI via CAVI.

Abstract

Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure $ρ$. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities $ρ$. If additionally $\log ρ$ has Lipschitz gradient, we find a linear rate of convergence, and if also $ρ$ is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when $ρ$ is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

TL;DR

This work provides the first general convergence theory for Coordinate Ascent Variational Inference (CAVI) in MFVI when the target is log-concave, showing that the MFVI objective is geodesically convex in Wasserstein space and that CAVI acts as a block coordinate descent on that geometry. Under mild integrability, the iterates converge to a minimizer; a strictly convex potential yields a unique minimizer and weak convergence of the iterates. With a Lipschitz gradient for the log-density, the paper proves a linear convergence rate, and with strong convexity, exponential convergence, both characterized in terms of the Wasserstein diameter and problem constants. The Gaussian special case exhibits a dimension-free exponential rate, illustrating the sharpness of the theory in practice. Overall, the results bridge optimal-transport geometry and classical convex optimization to deliver practical convergence guarantees for MFVI via CAVI.

Abstract

Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure . The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities . If additionally has Lipschitz gradient, we find a linear rate of convergence, and if also is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.
Paper Structure (19 sections, 27 theorems, 118 equations)

This paper contains 19 sections, 27 theorems, 118 equations.

Key Result

Theorem 1.1

Let $\psi : {\mathbb R}^k \to {\mathbb R}$ be a convex function such that $\rho(x) \propto e^{-\psi(x)}$ defines a probability density, and assume there exist finite constants $c > 0$ and $p \ge 2$ such that where $\nabla \psi$ is the weak gradient. Let $\mu_0 \in {\mathcal{P}}^{\otimes d}({\mathbb R}^{k_i})$ have finite $p$moment, and define the CAVI iterates $\mu_n=\bigotimes_{i=1}^d \mu_n^i$ a

Theorems & Definitions (51)

  • Theorem 1.1
  • Remark 1.2
  • Proposition 1.3
  • Proposition 1.4
  • Proposition 2.1
  • Theorem 2.2
  • Lemma 2.3
  • proof
  • Definition 2.4
  • Theorem 2.5
  • ...and 41 more