A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Sharath Raghvendra; Pouyan Shirzadian; Kaiyi Zhang

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang

TL;DR

The paper introduces the $(p,k)$-Robust Partial $p$-Wasserstein (RPW) distance, a metric defined via $\Pi_{p,k}(\mu,\nu)=\inf\{\varepsilon\ge 0: W_{p,1-\varepsilon}(\mu,\nu)\le k\varepsilon\}$ that balances transport cost against leaving mass behind. It proves that RPW is a true metric, relates it to the OT-profile, and shows it interpolates between total variation and $p$-Wasserstein, providing robustness to outliers and sampling discrepancies. The authors establish dimension-dependent convergence rates for empirical RPW, propose two practical computation algorithms, and demonstrate superior image retrieval performance on noisy real-world datasets. These results offer a principled, scalable dissimilarity for comparing distributions under realistic noise and sampling conditions, with potential impact on tasks like GAN loss design and distributional barycenters.

Abstract

The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k \ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $\mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p \in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the Lévy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

TL;DR

The paper introduces the

-Robust Partial

-Wasserstein (RPW) distance, a metric defined via

that balances transport cost against leaving mass behind. It proves that RPW is a true metric, relates it to the OT-profile, and shows it interpolates between total variation and

-Wasserstein, providing robustness to outliers and sampling discrepancies. The authors establish dimension-dependent convergence rates for empirical RPW, propose two practical computation algorithms, and demonstrate superior image retrieval performance on noisy real-world datasets. These results offer a principled, scalable dissimilarity for comparing distributions under realistic noise and sampling conditions, with potential impact on tasks like GAN loss design and distributional barycenters.

Abstract

The

-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the

-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical

-Wasserstein distance on

samples in

to converge to the true distance at a rate of

, which is significantly slower than the rate of

for

-Wasserstein distance. We introduce a new family of distances parameterized by

, called

-RPW that is based on computing the partial

-Wasserstein distance. We show that (1)

-RPW satisfies the metric properties, (2)

-RPW is robust to small outlier mass while retaining the sensitivity of

-Wasserstein distance to minor geometric differences, and (3) when

is a constant,

-RPW distance between empirical distributions on

samples in

converges to the true distance at a rate of

, which is faster than the convergence rate of

for the

-Wasserstein distance. Using the partial

-Wasserstein distance, we extend our distance to any

. By setting parameters

appropriately, we can reduce our distance to the total variation,

-Wasserstein, and the Lévy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the

-Wasserstein,

-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

Paper Structure (25 sections, 24 theorems, 65 equations, 7 figures)

This paper contains 25 sections, 24 theorems, 65 equations, 7 figures.

Introduction
Our Results:
Notations.
Robust Partial -Wasserstein Metric
Robustness Properties
Robustness to Outlier Noise
Robustness to Sampling Discrepancies.
Extension to arbitrary diameter.
Relation to Other Distances
Lévy-Prokhorov distance.
Total Variation.
$p$-Wasserstein distance.
Algorithms to Compute -RPW
Experimental Results
Image Retrieval.
...and 10 more sections

Key Result

Theorem 2.1

Given a metric space $({\mathcal{X}}, c)$ with a unit diameter and any parameters $p\ge 1$ and $k\ge 0$, the $(p,k)$-RPW distance function $\Pi_{p, k}(\cdot, \cdot)$ for all probability distributions defined over $({\mathcal{X}}, c)$ is a metric.

Figures (7)

Figure 1: Interpretations of different distance functions.
Figure 2: Interpretation of distances based on the OT-profile.
Figure 3: The triangle inequality of the RPW distance function.
Figure 4: (a) A distribution $\mu$ (shaded gray area) and an empirical distribution $\mu_n$ (red dots), (b) $\gamma_1$ transports as much mass as possible inside each cell of ${\mathcal{G}}_1$, (c) for the remaining mass, $\gamma_2$ transports as much remaining mass as possible inside the cells of ${\mathcal{G}}_2$, and (d) the transport plan $\gamma$, which is the sum of $\gamma_1$ and $\gamma_2$.
Figure 5: The results of our experiments on image retrieval on (left column) MNIST dataset and (right column) CIFAR-10 dataset.
...and 2 more figures

Theorems & Definitions (34)

Theorem 2.1
Lemma 2.1
Theorem 3.1
Lemma 3.1
Lemma 3.2
Lemma 3.3
Lemma 3.3
Lemma 3.3
Theorem 3.4
Lemma 4.0
...and 24 more

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

TL;DR

Abstract

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (34)