Properties of Discrete Sliced Wasserstein Losses

Eloi Tanguy; Rémi Flamary; Julie Delon

Properties of Discrete Sliced Wasserstein Losses

Eloi Tanguy, Rémi Flamary, Julie Delon

TL;DR

This work analyzes the energy landscape of the Sliced Wasserstein distance between two uniform discrete measures, focusing on the function $\mathcal{E}(Y)=\mathrm{SW}_2^2(\gamma_Y,\gamma_Z)$ and its Monte-Carlo surrogate $\mathcal{E}_p$. It establishes regularity (local Lipschitzness, a.e. differentiability), a cell-decomposed structure that renders $\mathcal{E}_p$ piecewise quadratic and semi-concave, and proves almost-sure uniform convergence of $\mathcal{E}_p$ to $\mathcal{E}$ with a uniform CLT for the approximation error. The paper then analyzes optimization landscapes: global optima correspond to exact SW equality, critical points satisfy a fixed-point relation, and as $p\to\infty$ the $\mathcal{E}_p$ critical points approximate those of $\mathcal{E}$ with explicit convergence rates. Building on this, it provides a rigorous framework for stochastic gradient methods, showing convergence of interpolated and noised SGD to Clarke critical points, and discusses generalizations to batching and barycenters. Numerical experiments corroborate the theory, illustrating the emergence and dilution of local optima with increasing $p$, and demonstrating SGD dynamics across dimensions. Overall, the results offer theoretical guarantees for optimizing sliced-Wasserstein energies in discrete settings and clarify how projection count and noise affect convergence toward the true SW landscape.

Abstract

The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(γ_Y, γ_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.

Properties of Discrete Sliced Wasserstein Losses

TL;DR

This work analyzes the energy landscape of the Sliced Wasserstein distance between two uniform discrete measures, focusing on the function

and its Monte-Carlo surrogate

. It establishes regularity (local Lipschitzness, a.e. differentiability), a cell-decomposed structure that renders

piecewise quadratic and semi-concave, and proves almost-sure uniform convergence of

with a uniform CLT for the approximation error. The paper then analyzes optimization landscapes: global optima correspond to exact SW equality, critical points satisfy a fixed-point relation, and as

the

critical points approximate those of

with explicit convergence rates. Building on this, it provides a rigorous framework for stochastic gradient methods, showing convergence of interpolated and noised SGD to Clarke critical points, and discusses generalizations to batching and barycenters. Numerical experiments corroborate the theory, illustrating the emergence and dilution of local optima with increasing

, and demonstrating SGD dynamics across dimensions. Overall, the results offer theoretical guarantees for optimizing sliced-Wasserstein energies in discrete settings and clarify how projection count and noise affect convergence toward the true SW landscape.

Abstract

, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support

of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation

(estimating the expectation in SW using only

samples) and show convergence results on the critical points of

to those of

, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process

. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising

and

converge towards (Clarke) critical points of these energies.

Paper Structure (48 sections, 31 theorems, 135 equations, 18 figures, 2 algorithms)

This paper contains 48 sections, 31 theorems, 135 equations, 18 figures, 2 algorithms.

Introduction
Sliced and Empirical Sliced Wasserstein Energies and their Regularities
The discrete SW energies S and Sp
Regularity properties of Sp and S
Cell structure of Sp
Consequences of the cell structure on the regularity of Sp and S
Convergence of SWpY to SWY
Illustration in a simplified case
Properties of the Optimisation Landscapes of S and Sp
Optimising S
Global optima of S
Critical points of SWY
Optimising Sp
Global optima of Sp
Critical points of Sp and cell stability
...and 33 more sections

Key Result

Lemma 2.1

Let $\alpha, \overline{\alpha}, \in \Sigma_n$, $\beta, \overline{\beta} \in \Sigma_m$ and $C, \overline{C} \in \mathbb{R}_+^{n\times m}$. Denote by $\mathrm{W}(\alpha, \beta; C) := \underset{\pi \in \Pi(\alpha, \beta)}{\inf}\ \pi \cdot C$ the cost of the discrete Kantorovich problem of cost matrix $

Figures (18)

Figure 1: Comparison between Sliced Wasserstein (a) and Wasserstein (b) landscapes for 2-point discrete measures $Y = (y, -y)^T$ and $Z = (z_1,z_2)^T$ with $z_1 = (0,-1)^T$ and $z_2 = (0,1)^T$.
Figure 2: The landscape $\mathcal{E}_p$ approaches $\mathcal{E}$ as $p$ increases, but introduces numerous strict local optima. Notice that when $p$ is too small ($p=1\leq d$ in particular), $\mathcal{E}_p$ even introduces other global optima.
Figure 3: Illustration of the cell structure for $p=4$ in dimension 2 from a BCD viewpoint. On the left, we view different points $Y = (y_1, y_2)$ (in red and orange) and the minima of their respective quadratics: $(y_1^*, y_2^*)$, which should be compared to the original points $(z_1, z_2)$ in purple. On the right, we view the cell structure depending on the position of $y_2 - y_1 \in \mathbb{R}^2$, since the cell conditions only depend on this difference (see \ref{['eqn:2cell_polytope']}). We can see that in this example all cells are stable, thus there are three strict local optima of $\mathcal{E}_p$ in addition to the global optimum. The $(y_1, y_2)$ pair number 0 is sent to $(z_2, z_1)$, while the pair "1" is sent to a local optimum, and the pair "2" is sent to $(z_1, z_2)$.
Figure 4: The stars, circles and squares are the Clarke critical points of $x \mapsto \mathcal{E}_p(X = (-x, x)^T),\; x \in \mathbb{R}^2$ for $p=3$. The squares do not correspond to local optima of $\mathcal{E}_p$, and are unlikely to be reached numerically. The circles and stars correspond to local optima of $\mathcal{E}_p$: the stars correspond to the global optima and satisfy the desired results $\mathcal{E}_p = 0$, while the circles are strict local optima.
Figure 5: BCD on $\mathcal{E}_p$ with different initial positions $Y^{(0)}$, with fixed projections (first sample). Each of the two points of the trajectory $Y^{(t)} = (y_1^{(t)}, y_2^{(t)})$ is coloured with respect to the point of the original measure $\gamma_Z$ to which they converge.
...and 13 more figures

Theorems & Definitions (55)

Remark 2.1
Lemma 2.1: Stability of the Wasserstein cost
Remark 2.2
Remark 2.3
Proposition 2.1
proof
Theorem 2.1
proof
Theorem 2.2: Regularity of $\mathcal{E}$, from Bonneel et al. bonneel2015sliced Theorem 1
Remark 2.4
...and 45 more

Properties of Discrete Sliced Wasserstein Losses

TL;DR

Abstract

Properties of Discrete Sliced Wasserstein Losses

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (55)