Properties of Discrete Sliced Wasserstein Losses
Eloi Tanguy, Rémi Flamary, Julie Delon
TL;DR
This work analyzes the energy landscape of the Sliced Wasserstein distance between two uniform discrete measures, focusing on the function $\mathcal{E}(Y)=\mathrm{SW}_2^2(\gamma_Y,\gamma_Z)$ and its Monte-Carlo surrogate $\mathcal{E}_p$. It establishes regularity (local Lipschitzness, a.e. differentiability), a cell-decomposed structure that renders $\mathcal{E}_p$ piecewise quadratic and semi-concave, and proves almost-sure uniform convergence of $\mathcal{E}_p$ to $\mathcal{E}$ with a uniform CLT for the approximation error. The paper then analyzes optimization landscapes: global optima correspond to exact SW equality, critical points satisfy a fixed-point relation, and as $p\to\infty$ the $\mathcal{E}_p$ critical points approximate those of $\mathcal{E}$ with explicit convergence rates. Building on this, it provides a rigorous framework for stochastic gradient methods, showing convergence of interpolated and noised SGD to Clarke critical points, and discusses generalizations to batching and barycenters. Numerical experiments corroborate the theory, illustrating the emergence and dilution of local optima with increasing $p$, and demonstrating SGD dynamics across dimensions. Overall, the results offer theoretical guarantees for optimizing sliced-Wasserstein energies in discrete settings and clarify how projection count and noise affect convergence toward the true SW landscape.
Abstract
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(γ_Y, γ_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
