Non-geodesically-convex optimization in the Wasserstein space

Hoang Phuc Hau Luu; Hanlin Yu; Bernardo Williams; Petrus Mikkola; Marcelo Hartmann; Kai Puolamäki; Arto Klami

Non-geodesically-convex optimization in the Wasserstein space

Hoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, Petrus Mikkola, Marcelo Hartmann, Kai Puolamäki, Arto Klami

TL;DR

This work studies optimization over the Wasserstein space $\mathcal{P}_2(\mathbb{R}^d)$ for a non geodesically convex objective $\mathcal{F}(\mu)=\mathcal{E}_{F}(\mu)+\mathscr{H}(\mu)$ with $F=G-H$ DC and $\mathscr{H}$ convex along generalized geodesics. It introduces a semi Forward-Backward Euler scheme that alternates a forward step on the DC concave part and a backward JKO step on the convex part, exploiting Brenier maps to obtain convergence guarantees even when $F$ is nonconvex. The authors establish asymptotic and nonasymptotic rates for the Wasserstein gradient mapping and Fréchet subdifferentials, and show global convergence under a Łojasiewicz-type inequality with explicit rates in three regimes of the exponent $\theta$, plus KL-based convergence when $\mathscr{H}$ is the negative entropy. They also provide practical transport-map based implementations using input convex neural networks and illustrate the approach on nonconvex sampling tasks such as Gaussian mixtures and distance-to-set priors.

Abstract

We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is nonconvex along generalized geodesics. Specifically, the objective exhibits some difference-of-convex structure along these geodesics. The setting also encompasses sampling problems where the logarithm of the target distribution is difference-of-convex. We derive multiple convergence insights for a novel semi Forward-Backward Euler scheme under several nonconvex (and possibly nonsmooth) regimes. Notably, the semi Forward-Backward Euler is just a slight modification of the Forward-Backward Euler whose convergence is -- to our knowledge -- still unknown in our very general non-geodesically-convex setting.

Non-geodesically-convex optimization in the Wasserstein space

TL;DR

This work studies optimization over the Wasserstein space

for a non geodesically convex objective

with

DC and

convex along generalized geodesics. It introduces a semi Forward-Backward Euler scheme that alternates a forward step on the DC concave part and a backward JKO step on the convex part, exploiting Brenier maps to obtain convergence guarantees even when

is nonconvex. The authors establish asymptotic and nonasymptotic rates for the Wasserstein gradient mapping and Fréchet subdifferentials, and show global convergence under a Łojasiewicz-type inequality with explicit rates in three regimes of the exponent

, plus KL-based convergence when

is the negative entropy. They also provide practical transport-map based implementations using input convex neural networks and illustrate the approach on nonconvex sampling tasks such as Gaussian mixtures and distance-to-set priors.

Abstract

Paper Structure (41 sections, 19 theorems, 122 equations, 1 figure, 2 algorithms)

This paper contains 41 sections, 19 theorems, 122 equations, 1 figure, 2 algorithms.

Introduction
Why difference-of-convex structure?
Context
Contributions
Preliminaries
Notations and basic results in measure theory and functional analysis
Optimal transport ambrosio2005gradientambrosio2006gradientvillani2021topicsvillani2009optimal
Subdifferential calculus in the Wasserstein space
Optimization in the Wasserstein space
First-order optimality conditions
Semi Forward-Backward Euler for difference-of-convex structures
Wasserstein gradient flows: different types of discretizations
Problem setting
Optimality charactizations
Semi FB Euler: a general setting
...and 26 more sections

Key Result

Lemma 1

Under Assumptions assum_main and assump:measurable, let $\{\mu_n\}_{n \in \mathbb{N}}$ be the sequence of distributions produced by semi FB Euler starting from some $\mu_0 \in \mathcal{P}_{2, \mathop{\mathrm{abs}}\nolimits}(X)$ with $0<\eta < \eta_0$. Then it holds $\mathcal{F}(\mu_{n+1}) \leq \math

Figures (1)

Figure 1: (a) and (b): Mixture of Gaussians. (a) shows samples obtained from semi FB Euler at iteration $40$ and (b) shows KL divergence along the training process: semi FB Euler with sound theory is as fast as FB Euler. We also show the ULA's final result as a horizontal line for reference; (c) and (d): Relaxed von Mises-Fisher. (c) shows true probability density, and (d) shows the sample histogram obtained from semi FB Euler. In this experiment, FB Euler fails to work, attributed to the high curvature of the relaxed von Mises-Fisher.

Theorems & Definitions (30)

Definition 1
Lemma 1: Descent lemma
Theorem 1: Asymptotic convergence
Remark 1
Theorem 2: Convergence rate: Wasserstein (sub)gradient mapping
Theorem 3: Convergence rate: Fréchet subdifferentials
Remark 2
Theorem 4
Remark 3
Theorem 5
...and 20 more

Non-geodesically-convex optimization in the Wasserstein space

TL;DR

Abstract

Non-geodesically-convex optimization in the Wasserstein space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (30)