Table of Contents
Fetching ...

First-order Conditions for Optimization in the Wasserstein Space

Nicolas Lanzetti, Saverio Bolognani, Florian Dörfler

TL;DR

This work tackles constrained optimization over the space of probability measures endowed with the Wasserstein distance $W_2$. It develops a rigorous differential-variational framework, introducing Wasserstein subdifferentials and gradients, and derives novel necessary and sufficient KKT-type conditions for equality and inequality constraints. The authors demonstrate how these first-order conditions yield interpretable criteria and, in several cases, closed-form solutions for distributionally robust optimization and Kullback–Leibler inference problems. The methodology unifies mean-variance, KL-divergence, and Wasserstein-distance functionals within a single variational setting, enabling tractable analysis and design of robust statistical procedures. Overall, the paper provides a principled toolkit for optimization in the space of probability measures with practical implications for DRO and statistical inference.

Abstract

We study first-order optimality conditions for constrained optimization in the Wasserstein space, whereby one seeks to minimize a real-valued function over the space of probability measures endowed with the Wasserstein distance. Our analysis combines recent insights on the geometry and the differential structure of the Wasserstein space with more classical calculus of variations. We show that simple rationales such as "setting the derivative to zero" and "gradients are aligned at optimality" carry over to the Wasserstein space. We deploy our tools to study and solve optimization problems in the setting of distributionally robust optimization and statistical inference. The generality of our methodology allows us to naturally deal with functionals, such as mean-variance, Kullback-Leibler divergence, and Wasserstein distance, which are traditionally difficult to study in a unified framework.

First-order Conditions for Optimization in the Wasserstein Space

TL;DR

This work tackles constrained optimization over the space of probability measures endowed with the Wasserstein distance . It develops a rigorous differential-variational framework, introducing Wasserstein subdifferentials and gradients, and derives novel necessary and sufficient KKT-type conditions for equality and inequality constraints. The authors demonstrate how these first-order conditions yield interpretable criteria and, in several cases, closed-form solutions for distributionally robust optimization and Kullback–Leibler inference problems. The methodology unifies mean-variance, KL-divergence, and Wasserstein-distance functionals within a single variational setting, enabling tractable analysis and design of robust statistical procedures. Overall, the paper provides a principled toolkit for optimization in the space of probability measures with practical implications for DRO and statistical inference.

Abstract

We study first-order optimality conditions for constrained optimization in the Wasserstein space, whereby one seeks to minimize a real-valued function over the space of probability measures endowed with the Wasserstein distance. Our analysis combines recent insights on the geometry and the differential structure of the Wasserstein space with more classical calculus of variations. We show that simple rationales such as "setting the derivative to zero" and "gradients are aligned at optimality" carry over to the Wasserstein space. We deploy our tools to study and solve optimization problems in the setting of distributionally robust optimization and statistical inference. The generality of our methodology allows us to naturally deal with functionals, such as mean-variance, Kullback-Leibler divergence, and Wasserstein distance, which are traditionally difficult to study in a unified framework.
Paper Structure (21 sections, 39 theorems, 219 equations, 1 table)

This paper contains 21 sections, 39 theorems, 219 equations, 1 table.

Key Result

Theorem 2.2

\newlabelthm:brenier0 Let $\mu,\nu\in\mathcal{P}_{2}(\mathbb{R}^d)$, and assume that $\mu$ is absolutely continuous w.r.t. the Lebesgue measure. Then, there exists a unique optimal transport plan $\gamma$, induced by a unique optimal transport map $T_{\mu}^{\nu}$, and $T_{\mu}^{\nu}=\nabla\phi$$\m

Theorems & Definitions (95)

  • Example 1.1: Distributionally robust optimization
  • Example 1.2: Statistical inference
  • Example 1.3: Maximum likelihood deconvolution
  • Definition 2.1: Wasserstein distance Ambrosio2008a
  • Theorem 2.2: Brenier's theorem Ambrosio2008a
  • Proposition 2.3: inverse of optimal transport maps
  • Remark 2.4
  • Proposition 2.5: pushforward via gradients of convex functions Santambrogio2015
  • Lemma 2.6: perturbation of probability measures bonnet2019optimal
  • Definition 2.7: Wasserstein sub- and super-differential Bonnet2019a
  • ...and 85 more