Table of Contents
Fetching ...

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang

TL;DR

The paper introduces the $(p,k)$-Robust Partial $p$-Wasserstein (RPW) distance, a metric defined via $\Pi_{p,k}(\mu,\nu)=\inf\{\varepsilon\ge 0: W_{p,1-\varepsilon}(\mu,\nu)\le k\varepsilon\}$ that balances transport cost against leaving mass behind. It proves that RPW is a true metric, relates it to the OT-profile, and shows it interpolates between total variation and $p$-Wasserstein, providing robustness to outliers and sampling discrepancies. The authors establish dimension-dependent convergence rates for empirical RPW, propose two practical computation algorithms, and demonstrate superior image retrieval performance on noisy real-world datasets. These results offer a principled, scalable dissimilarity for comparing distributions under realistic noise and sampling conditions, with potential impact on tasks like GAN loss design and distributional barycenters.

Abstract

The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k \ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $\mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p \in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the Lévy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

TL;DR

The paper introduces the -Robust Partial -Wasserstein (RPW) distance, a metric defined via that balances transport cost against leaving mass behind. It proves that RPW is a true metric, relates it to the OT-profile, and shows it interpolates between total variation and -Wasserstein, providing robustness to outliers and sampling discrepancies. The authors establish dimension-dependent convergence rates for empirical RPW, propose two practical computation algorithms, and demonstrate superior image retrieval performance on noisy real-world datasets. These results offer a principled, scalable dissimilarity for comparing distributions under realistic noise and sampling conditions, with potential impact on tasks like GAN loss design and distributional barycenters.

Abstract

The -Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the -Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical -Wasserstein distance on samples in to converge to the true distance at a rate of , which is significantly slower than the rate of for -Wasserstein distance. We introduce a new family of distances parameterized by , called -RPW that is based on computing the partial -Wasserstein distance. We show that (1) -RPW satisfies the metric properties, (2) -RPW is robust to small outlier mass while retaining the sensitivity of -Wasserstein distance to minor geometric differences, and (3) when is a constant, -RPW distance between empirical distributions on samples in converges to the true distance at a rate of , which is faster than the convergence rate of for the -Wasserstein distance. Using the partial -Wasserstein distance, we extend our distance to any . By setting parameters or appropriately, we can reduce our distance to the total variation, -Wasserstein, and the Lévy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the -Wasserstein, -Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.
Paper Structure (25 sections, 24 theorems, 65 equations, 7 figures)

This paper contains 25 sections, 24 theorems, 65 equations, 7 figures.

Key Result

Theorem 2.1

Given a metric space $({\mathcal{X}}, c)$ with a unit diameter and any parameters $p\ge 1$ and $k\ge 0$, the $(p,k)$-RPW distance function $\Pi_{p, k}(\cdot, \cdot)$ for all probability distributions defined over $({\mathcal{X}}, c)$ is a metric.

Figures (7)

  • Figure 1: Interpretations of different distance functions.
  • Figure 2: Interpretation of distances based on the OT-profile.
  • Figure 3: The triangle inequality of the RPW distance function.
  • Figure 4: (a) A distribution $\mu$ (shaded gray area) and an empirical distribution $\mu_n$ (red dots), (b) $\gamma_1$ transports as much mass as possible inside each cell of ${\mathcal{G}}_1$, (c) for the remaining mass, $\gamma_2$ transports as much remaining mass as possible inside the cells of ${\mathcal{G}}_2$, and (d) the transport plan $\gamma$, which is the sum of $\gamma_1$ and $\gamma_2$.
  • Figure 5: The results of our experiments on image retrieval on (left column) MNIST dataset and (right column) CIFAR-10 dataset.
  • ...and 2 more figures

Theorems & Definitions (34)

  • Theorem 2.1
  • Lemma 2.1
  • Theorem 3.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.3
  • Lemma 3.3
  • Theorem 3.4
  • Lemma 4.0
  • ...and 24 more