Table of Contents
Fetching ...

Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate

Ferdinand Genans, Antoine Godichon-Baggioni, François-Xavier Vialard, Olivier Wintenberger

TL;DR

This work analyzes SGD-based solvers for semi-discrete optimal transport, where a continuous source $\mu$ is transported to a discrete target $\nu$. It introduces projected SGD (PSGD) on the semi-dual and establishes minimax-optimal rates for estimating the OT map under MTW-type costs, including the quadratic cost on unbounded supports, via a localization projection set and restricted strong convexity. The paper shows that averaged PSGD achieves $\mathcal{O}\left(1/n\right)$ rates for OT quantities like the Brenier potential and $\mathcal{O}\left(1/\sqrt{n}\right)$ for the OT map, with matching minimax lower bounds, and provides non-averaged rates and detailed conditions under MTW and non-MTW costs. Numerical experiments corroborate the theoretical rates across compact and non-compact settings, demonstrating that these SGD-based solvers circumvent the curse of dimensionality in the semi-discrete OT map estimation. Overall, the results extend convergence guarantees to online/sample-based regimes and non-compact supports, bridging theory and scalable applications in ML tasks involving semi-discrete OT.

Abstract

We investigate the semi-discrete Optimal Transport (OT) problem, where a continuous source measure $μ$ is transported to a discrete target measure $ν$, with particular attention to the OT map approximation. In this setting, Stochastic Gradient Descent (SGD) based solvers have demonstrated strong empirical performance in recent machine learning applications, yet their theoretical guarantee to approximate the OT map is an open question. In this work, we answer it positively by providing both computational and statistical convergence guarantees of SGD. Specifically, we show that SGD methods can estimate the OT map with a minimax convergence rate of $\mathcal{O}(1/\sqrt{n})$, where $n$ is the number of samples drawn from $μ$. To establish this result, we study the averaged projected SGD algorithm, and identify a suitable projection set that contains a minimizer of the objective, even when the source measure is not compactly supported. Our analysis holds under mild assumptions on the source measure and applies to MTW cost functions,whic include $\|\cdot\|^p$ for $p \in (1, \infty)$. We finally provide numerical evidence for our theoretical results.

Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate

TL;DR

This work analyzes SGD-based solvers for semi-discrete optimal transport, where a continuous source is transported to a discrete target . It introduces projected SGD (PSGD) on the semi-dual and establishes minimax-optimal rates for estimating the OT map under MTW-type costs, including the quadratic cost on unbounded supports, via a localization projection set and restricted strong convexity. The paper shows that averaged PSGD achieves rates for OT quantities like the Brenier potential and for the OT map, with matching minimax lower bounds, and provides non-averaged rates and detailed conditions under MTW and non-MTW costs. Numerical experiments corroborate the theoretical rates across compact and non-compact settings, demonstrating that these SGD-based solvers circumvent the curse of dimensionality in the semi-discrete OT map estimation. Overall, the results extend convergence guarantees to online/sample-based regimes and non-compact supports, bridging theory and scalable applications in ML tasks involving semi-discrete OT.

Abstract

We investigate the semi-discrete Optimal Transport (OT) problem, where a continuous source measure is transported to a discrete target measure , with particular attention to the OT map approximation. In this setting, Stochastic Gradient Descent (SGD) based solvers have demonstrated strong empirical performance in recent machine learning applications, yet their theoretical guarantee to approximate the OT map is an open question. In this work, we answer it positively by providing both computational and statistical convergence guarantees of SGD. Specifically, we show that SGD methods can estimate the OT map with a minimax convergence rate of , where is the number of samples drawn from . To establish this result, we study the averaged projected SGD algorithm, and identify a suitable projection set that contains a minimizer of the objective, even when the source measure is not compactly supported. Our analysis holds under mild assumptions on the source measure and applies to MTW cost functions,whic include for . We finally provide numerical evidence for our theoretical results.

Paper Structure

This paper contains 45 sections, 31 theorems, 182 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.1

Suppose that the OT problem is well-posed, strong duality holds and eq::semi_dual_cvx admits a minimum. Then, there exists a minimizer ${\mathbf{g}}^*$ contained in the set where $\|c\|_{K,\infty} := \sup_{x \in K, \, j \in \llbracket 1, M \rrbracket} |c(x, y_j)|$, for any compact $K$ satisfying $\mu(K) \geq 1 - \frac{1}{2} \min_j w_j$.

Figures (3)

  • Figure 1: Convergence rates of our OT potential and OT map estimators across different settings.
  • Figure 2: S-Adam outperforms Adam on Ex. 1, avoiding convergence plateau.
  • Figure : Projected Stochastic Gradient Descent (PSGD)

Theorems & Definitions (60)

  • Lemma 3.1: Existence of a projection set
  • Example 1
  • Theorem 3.2: PSGD in the general setting
  • Proposition 4.1
  • Lemma 4.2
  • Remark 4.3
  • Theorem 4.4: Non-averaged iterates
  • Theorem 4.5: Averaged iterates
  • Corollary 4.6
  • Theorem 5.1
  • ...and 50 more