Energy-Based Sliced Wasserstein Distance

Khai Nguyen; Nhat Ho

Energy-Based Sliced Wasserstein Distance

Khai Nguyen, Nhat Ho

TL;DR

The paper addresses the limitation of fixed or optimization-based slicing in sliced Wasserstein metrics by introducing an energy-based, parameter-free slicing distribution that weights directions by an increasing function of the projected 1D Wasserstein distance. It defines the Energy-Based Sliced Wasserstein (EBSW) distance, proves its metric-like properties and its connection to SW and Max-SW, and establishes that EBSW preserves weak convergence with a favorable sample complexity that avoids the curse of dimensionality. The authors develop practical estimators—Importance Sampling (IS), Sampling Importance Resampling (SIR), and Metropolis-Hastings (MCMC)—and analyze their computational properties, including unbiasedness for EBSW^p under IS. Empirical results on point-cloud gradient flows, color transfer, and deep point-cloud reconstruction show that EBSW variants, especially IS-EBSW with exponential energy, outperform SW, Max-SW, and DSW in convergence speed and final accuracy, while maintaining comparable computational costs. This approach provides a robust, discriminative, optimization-free alternative for high-dimensional distribution comparison with broad applicability to geometric learning tasks.

Abstract

The sliced Wasserstein (SW) distance has been widely recognized as a statistically effective and computationally efficient metric between two probability measures. A key component of the SW distance is the slicing distribution. There are two existing approaches for choosing this distribution. The first approach is using a fixed prior distribution. The second approach is optimizing for the best distribution which belongs to a parametric family of distributions and can maximize the expected distance. However, both approaches have their limitations. A fixed prior distribution is non-informative in terms of highlighting projecting directions that can discriminate two general probability measures. Doing optimization for the best distribution is often expensive and unstable. Moreover, designing the parametric family of the candidate distribution could be easily misspecified. To address the issues, we propose to design the slicing distribution as an energy-based distribution that is parameter-free and has the density proportional to an energy function of the projected one-dimensional Wasserstein distance. We then derive a novel sliced Wasserstein metric, energy-based sliced Waserstein (EBSW) distance, and investigate its topological, statistical, and computational properties via importance sampling, sampling importance resampling, and Markov Chain methods. Finally, we conduct experiments on point-cloud gradient flow, color transfer, and point-cloud reconstruction to show the favorable performance of the EBSW.

Energy-Based Sliced Wasserstein Distance

TL;DR

Abstract

Paper Structure (29 sections, 4 theorems, 52 equations, 8 figures, 6 tables, 7 algorithms)

This paper contains 29 sections, 4 theorems, 52 equations, 8 figures, 6 tables, 7 algorithms.

Introduction
Background
Energy-Based Sliced Wasserstein Distance
Energy-Based Slicing Distribution
Definitions, Topological, and Statistical Properties of Energy Based Sliced Wasserstein
Computational Methods and Computational Properties
Importance Sampling
Sampling Importance Resampling and Markov Chain Monte Carlo
Experiments
Visualization of energy-based slicing distribution
Point-Cloud Gradient Flows
Color Transfer
Deep Point-Cloud Reconstruction
Limitations and Conclusion
Proofs
...and 14 more sections

Key Result

Theorem 1

For any $p \geq 1$, energy-function $f$, the energy-based sliced Wasserstein $\text{EBSW}_{p}(\cdot,\cdot;f)$ is a semi-metric in the probability space on $\mathbb{R}^{d}$, namely EBSW satisfies non-negativity, symmetry, and identity of indiscernibles.

Figures (8)

Figure 1: Visualization of the true and the sampled energy-based slicing distributions, the optimal vMF distribution from the v-DSW, and the max projecting direction from the Max-SW.
Figure 2: Gradient flows from the SW, the Max-SW, the v-DSW, and the IS-EBSW-e in turn.
Figure 3: The figures show the source image, the target image, the transferred images from sliced Wasserstein variants, the corresponding Wasserstein-2 distances to the target color palette, and the computational time.
Figure 4: Gradient flows from the SW, the Max-SW, the v-DSW, the IS-EBSW-e, the SIR-EBSW-e, the IMH-EBSW-e, and the RMH-EBSW-e in turn.
Figure 5: The first two rows are with $L=100$, (c) denotes the "parameter-copy" (the SIR-EBSW-e, the IMH-EBSW-e, the RMH-EBSW always use the "parameter-copy" estimator since the conventional estimator is not stable for them), and the last row is with $L=10$.
...and 3 more figures

Theorems & Definitions (9)

Definition 1
Example 1
Definition 2
Theorem 1
Proposition 1
Theorem 2
Proposition 2
Definition 3
Definition 4

Energy-Based Sliced Wasserstein Distance

TL;DR

Abstract

Energy-Based Sliced Wasserstein Distance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)