Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate
Ferdinand Genans, Antoine Godichon-Baggioni, François-Xavier Vialard, Olivier Wintenberger
TL;DR
This work analyzes SGD-based solvers for semi-discrete optimal transport, where a continuous source $\mu$ is transported to a discrete target $\nu$. It introduces projected SGD (PSGD) on the semi-dual and establishes minimax-optimal rates for estimating the OT map under MTW-type costs, including the quadratic cost on unbounded supports, via a localization projection set and restricted strong convexity. The paper shows that averaged PSGD achieves $\mathcal{O}\left(1/n\right)$ rates for OT quantities like the Brenier potential and $\mathcal{O}\left(1/\sqrt{n}\right)$ for the OT map, with matching minimax lower bounds, and provides non-averaged rates and detailed conditions under MTW and non-MTW costs. Numerical experiments corroborate the theoretical rates across compact and non-compact settings, demonstrating that these SGD-based solvers circumvent the curse of dimensionality in the semi-discrete OT map estimation. Overall, the results extend convergence guarantees to online/sample-based regimes and non-compact supports, bridging theory and scalable applications in ML tasks involving semi-discrete OT.
Abstract
We investigate the semi-discrete Optimal Transport (OT) problem, where a continuous source measure $μ$ is transported to a discrete target measure $ν$, with particular attention to the OT map approximation. In this setting, Stochastic Gradient Descent (SGD) based solvers have demonstrated strong empirical performance in recent machine learning applications, yet their theoretical guarantee to approximate the OT map is an open question. In this work, we answer it positively by providing both computational and statistical convergence guarantees of SGD. Specifically, we show that SGD methods can estimate the OT map with a minimax convergence rate of $\mathcal{O}(1/\sqrt{n})$, where $n$ is the number of samples drawn from $μ$. To establish this result, we study the averaged projected SGD algorithm, and identify a suitable projection set that contains a minimizer of the objective, even when the source measure is not compactly supported. Our analysis holds under mild assumptions on the source measure and applies to MTW cost functions,whic include $\|\cdot\|^p$ for $p \in (1, \infty)$. We finally provide numerical evidence for our theoretical results.
