Table of Contents
Fetching ...

Non-geodesically-convex optimization in the Wasserstein space

Hoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, Petrus Mikkola, Marcelo Hartmann, Kai Puolamäki, Arto Klami

TL;DR

This work studies optimization over the Wasserstein space $\mathcal{P}_2(\mathbb{R}^d)$ for a non geodesically convex objective $\mathcal{F}(\mu)=\mathcal{E}_{F}(\mu)+\mathscr{H}(\mu)$ with $F=G-H$ DC and $\mathscr{H}$ convex along generalized geodesics. It introduces a semi Forward-Backward Euler scheme that alternates a forward step on the DC concave part and a backward JKO step on the convex part, exploiting Brenier maps to obtain convergence guarantees even when $F$ is nonconvex. The authors establish asymptotic and nonasymptotic rates for the Wasserstein gradient mapping and Fréchet subdifferentials, and show global convergence under a Łojasiewicz-type inequality with explicit rates in three regimes of the exponent $\theta$, plus KL-based convergence when $\mathscr{H}$ is the negative entropy. They also provide practical transport-map based implementations using input convex neural networks and illustrate the approach on nonconvex sampling tasks such as Gaussian mixtures and distance-to-set priors.

Abstract

We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is nonconvex along generalized geodesics. Specifically, the objective exhibits some difference-of-convex structure along these geodesics. The setting also encompasses sampling problems where the logarithm of the target distribution is difference-of-convex. We derive multiple convergence insights for a novel semi Forward-Backward Euler scheme under several nonconvex (and possibly nonsmooth) regimes. Notably, the semi Forward-Backward Euler is just a slight modification of the Forward-Backward Euler whose convergence is -- to our knowledge -- still unknown in our very general non-geodesically-convex setting.

Non-geodesically-convex optimization in the Wasserstein space

TL;DR

This work studies optimization over the Wasserstein space for a non geodesically convex objective with DC and convex along generalized geodesics. It introduces a semi Forward-Backward Euler scheme that alternates a forward step on the DC concave part and a backward JKO step on the convex part, exploiting Brenier maps to obtain convergence guarantees even when is nonconvex. The authors establish asymptotic and nonasymptotic rates for the Wasserstein gradient mapping and Fréchet subdifferentials, and show global convergence under a Łojasiewicz-type inequality with explicit rates in three regimes of the exponent , plus KL-based convergence when is the negative entropy. They also provide practical transport-map based implementations using input convex neural networks and illustrate the approach on nonconvex sampling tasks such as Gaussian mixtures and distance-to-set priors.

Abstract

We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is nonconvex along generalized geodesics. Specifically, the objective exhibits some difference-of-convex structure along these geodesics. The setting also encompasses sampling problems where the logarithm of the target distribution is difference-of-convex. We derive multiple convergence insights for a novel semi Forward-Backward Euler scheme under several nonconvex (and possibly nonsmooth) regimes. Notably, the semi Forward-Backward Euler is just a slight modification of the Forward-Backward Euler whose convergence is -- to our knowledge -- still unknown in our very general non-geodesically-convex setting.
Paper Structure (41 sections, 19 theorems, 122 equations, 1 figure, 2 algorithms)

This paper contains 41 sections, 19 theorems, 122 equations, 1 figure, 2 algorithms.

Key Result

Lemma 1

Under Assumptions assum_main and assump:measurable, let $\{\mu_n\}_{n \in \mathbb{N}}$ be the sequence of distributions produced by semi FB Euler starting from some $\mu_0 \in \mathcal{P}_{2, \mathop{\mathrm{abs}}\nolimits}(X)$ with $0<\eta < \eta_0$. Then it holds $\mathcal{F}(\mu_{n+1}) \leq \math

Figures (1)

  • Figure 1: (a) and (b): Mixture of Gaussians. (a) shows samples obtained from semi FB Euler at iteration $40$ and (b) shows KL divergence along the training process: semi FB Euler with sound theory is as fast as FB Euler. We also show the ULA's final result as a horizontal line for reference; (c) and (d): Relaxed von Mises-Fisher. (c) shows true probability density, and (d) shows the sample histogram obtained from semi FB Euler. In this experiment, FB Euler fails to work, attributed to the high curvature of the relaxed von Mises-Fisher.

Theorems & Definitions (30)

  • Definition 1
  • Lemma 1: Descent lemma
  • Theorem 1: Asymptotic convergence
  • Remark 1
  • Theorem 2: Convergence rate: Wasserstein (sub)gradient mapping
  • Theorem 3: Convergence rate: Fréchet subdifferentials
  • Remark 2
  • Theorem 4
  • Remark 3
  • Theorem 5
  • ...and 20 more