Markovian Sliced Wasserstein Distances: Beyond Independent Projections

Khai Nguyen; Tongzheng Ren; Nhat Ho

Markovian Sliced Wasserstein Distances: Beyond Independent Projections

Khai Nguyen, Tongzheng Ren, Nhat Ho

TL;DR

Markovian sliced Wasserstein (MSW) introduces a first-order Markov structure on projection directions to address redundancy in independent SW projections. It defines MSW via $MSW_{p,T}^p(\,\mu,\nu) = \mathbb{E}_{(\theta_{1:T}) \sim \sigma(\theta_{1:T})}[ \frac{1}{T} \sum_{t=1}^T W_p^p(\theta_t \sharp \mu, \theta_t \sharp \nu) ]$, with variants arising from orthogonal-based and input-aware transitions, plus a burning/thinning technique to reduce computation. The authors establish metricity under mild assumptions, weak convergence equivalence, sample complexity bounds, MC error analysis, and computational/memory trade-offs, and demonstrate improved performance over SW and prior variants in gradient flows, color transfer, and deep generative modeling on standard datasets. This framework yields a scalable, theoretically grounded alternative for comparing high-dimensional distributions while mitigating projection redundancy.

Abstract

Sliced Wasserstein (SW) distance suffers from redundant projections due to independent uniform random projecting directions. To partially overcome the issue, max K sliced Wasserstein (Max-K-SW) distance ($K\geq 1$), seeks the best discriminative orthogonal projecting directions. Despite being able to reduce the number of projections, the metricity of Max-K-SW cannot be guaranteed in practice due to the non-optimality of the optimization. Moreover, the orthogonality constraint is also computationally expensive and might not be effective. To address the problem, we introduce a new family of SW distances, named Markovian sliced Wasserstein (MSW) distance, which imposes a first-order Markov structure on projecting directions. We discuss various members of MSW by specifying the Markov structure including the prior distribution, the transition distribution, and the burning and thinning technique. Moreover, we investigate the theoretical properties of MSW including topological properties (metricity, weak convergence, and connection to other distances), statistical properties (sample complexity, and Monte Carlo estimation error), and computational properties (computational complexity and memory complexity). Finally, we compare MSW distances with previous SW variants in various applications such as gradient flows, color transfer, and deep generative modeling to demonstrate the favorable performance of MSW.

Markovian Sliced Wasserstein Distances: Beyond Independent Projections

TL;DR

Markovian sliced Wasserstein (MSW) introduces a first-order Markov structure on projection directions to address redundancy in independent SW projections. It defines MSW via

, with variants arising from orthogonal-based and input-aware transitions, plus a burning/thinning technique to reduce computation. The authors establish metricity under mild assumptions, weak convergence equivalence, sample complexity bounds, MC error analysis, and computational/memory trade-offs, and demonstrate improved performance over SW and prior variants in gradient flows, color transfer, and deep generative modeling on standard datasets. This framework yields a scalable, theoretically grounded alternative for comparing high-dimensional distributions while mitigating projection redundancy.

Abstract

), seeks the best discriminative orthogonal projecting directions. Despite being able to reduce the number of projections, the metricity of Max-K-SW cannot be guaranteed in practice due to the non-optimality of the optimization. Moreover, the orthogonality constraint is also computationally expensive and might not be effective. To address the problem, we introduce a new family of SW distances, named Markovian sliced Wasserstein (MSW) distance, which imposes a first-order Markov structure on projecting directions. We discuss various members of MSW by specifying the Markov structure including the prior distribution, the transition distribution, and the burning and thinning technique. Moreover, we investigate the theoretical properties of MSW including topological properties (metricity, weak convergence, and connection to other distances), statistical properties (sample complexity, and Monte Carlo estimation error), and computational properties (computational complexity and memory complexity). Finally, we compare MSW distances with previous SW variants in various applications such as gradient flows, color transfer, and deep generative modeling to demonstrate the favorable performance of MSW.

Paper Structure (26 sections, 11 theorems, 39 equations, 8 figures, 5 tables, 8 algorithms)

This paper contains 26 sections, 11 theorems, 39 equations, 8 figures, 5 tables, 8 algorithms.

Introduction
Background
Markovian Sliced Wasserstein distances
Definitions, Topological, Statistical, and Computational Properties
Specific Choices of Projecting Distributions
Burning and Thinning
Experiments
Gradient Flows and Color Transfer
Deep Generative Models
Conclusion
Additional Materials
Background on Sliced Wasserstein Variants
Von Mises-Fisher Distribution
Algorithms for Computing Markovian Sliced Wasserstein Distances
Burned Thinned Markovian Sliced Wasserstein Distance
...and 11 more sections

Key Result

Theorem 1

For any $p\geq 1$, $T \geq 1$, and dimension $d \geq 1$, if A1 holds, Markovian sliced Wasserstein $\text{MSW}_{p,T}(\cdot,\cdot)$ is a valid metric on the space of probability measures $\mathcal{P}_p(\mathbb{R}^d)$, namely, it satisfies the (i) non-negativity, (ii) symmetry, (iii) triangle inequali

Figures (8)

Figure 1: The figures show the gradient flows that are from the empirical distribution over the color points to the empirical distribution over S-shape points. The corresponding Wasserstein-2 distance between the empirical distribution at the current step and the S-shape distribution and the computational time (in seconds) to reach the step is reported at the top of the figure.
Figure 2: The figures show the source image, the target image, and the transferred images from different distances. The corresponding Wasserstein-2 distance between the empirical distribution over transferred color palates and the empirical distribution over the target color palette and the computational time (in second) are reported at the top of the figure.
Figure 3: The FID scores and the IS scores over epochs, and some generated images from CelebA.
Figure 4: The figures show the gradient flows that are from the empirical distribution over the color points to the empirical distribution over S-shape points produced by different distances. The corresponding Wasserstein-2 distance between the empirical distribution at the current step and the S-shape distribution and the computational time (in second) to reach the step is reported at the top of the figure.
Figure 5: The figures show the source image, the target image, and transferred images from different distances. The corresponding Wasserstein-2 distance between the empirical distribution over transferred color palates and the empirical distribution over the target color palette and the computational time (in second) is reported at the top of the figure. The color palates are given below the corresponding images.
...and 3 more figures

Theorems & Definitions (15)

Definition 1
Theorem 1: Metricity
Theorem 2: Weak Convergence
Proposition 1
Proposition 2: Sample Complexity
Proposition 3: Monte Carlo error
Definition 2
Definition 3
Proposition 4
Proposition 5: Weak Convergence
...and 5 more

Markovian Sliced Wasserstein Distances: Beyond Independent Projections

TL;DR

Abstract

Markovian Sliced Wasserstein Distances: Beyond Independent Projections

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (15)