Table of Contents
Fetching ...

Two-sample Test using Projected Wasserstein Distance

Jie Wang, Rui Gao, Yao Xie

TL;DR

This work tackles high-dimensional two-sample testing by introducing the projected Wasserstein distance $\mathcal{P}W$, which maximizes the Wasserstein distance after projecting data onto a $k$-dimensional subspace via an orthogonal map $A$ with $A^{\top}A=I_k$. It develops finite-sample IPM guarantees using Rademacher complexity for the projected function class and formulates a PW-based two-sample test with a data-driven acceptance region. The authors provide practical algorithms (Riemannian gradient methods) to compute $\mathcal{P}W$ and validate the approach with numerical experiments, showing competitive performance overall and improved robustness in high dimensions compared to MMD. The results indicate that projecting to an appropriate low-dimensional subspace mitigates the curse of dimensionality in Wasserstein-based testing, offering a scalable and interpretable tool for high-dimensional distribution comparison.

Abstract

We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize the theoretical property of the finite-sample convergence rate on IPMs and present practical algorithms for computing this metric. Numerical examples validate our theoretical results.

Two-sample Test using Projected Wasserstein Distance

TL;DR

This work tackles high-dimensional two-sample testing by introducing the projected Wasserstein distance , which maximizes the Wasserstein distance after projecting data onto a -dimensional subspace via an orthogonal map with . It develops finite-sample IPM guarantees using Rademacher complexity for the projected function class and formulates a PW-based two-sample test with a data-driven acceptance region. The authors provide practical algorithms (Riemannian gradient methods) to compute and validate the approach with numerical experiments, showing competitive performance overall and improved robustness in high dimensions compared to MMD. The results indicate that projecting to an appropriate low-dimensional subspace mitigates the curse of dimensionality in Wasserstein-based testing, offering a scalable and interpretable tool for high-dimensional distribution comparison.

Abstract

We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize the theoretical property of the finite-sample convergence rate on IPMs and present practical algorithms for computing this metric. Numerical examples validate our theoretical results.

Paper Structure

This paper contains 11 sections, 8 theorems, 43 equations, 3 figures, 1 algorithm.

Key Result

Proposition 1

Assume that Assumption Assumption:light:tail is satisfied, and let $\epsilon>0$. Then with probability at least, we have

Figures (3)

  • Figure 1: Comparison of ROC curves of the PW test versus the MMD test on two types of synthesis data. Fig. \ref{['fig:testing:AUC:a']}): Mean-shifted Gaussians with $n\in\{75,100\}$ and fixed $d=120$. Fig. \ref{['fig:testing:AUC:b']}): Mean-shifted Gaussians with $d\in\{60,120\}$ and fixed $n=75$. Fig. \ref{['fig:testing:AUC:c']}): Covariance-shifted Gaussians with $n\in\{75,100\}$ and fixed $d=60$. Fig. \ref{['fig:testing:AUC:d']}): Covariance-shifted Gaussians with $d\in\{30,60\}$ and fixed $n=75$.
  • Figure 2: Mean values and $95\%$-confidence intervals for $\mathcal{P}W(\hat{\mu}_n, \hat{\nu}_n)$ across different numbers of samples $n$. Results are averaged over $100$ independent trials. Fig. \ref{['fig:sample:complexity:a']}) corresponds to $H_0$ and Fig. \ref{['fig:sample:complexity:b']}) corresponds to $H_1$.
  • Figure 3: \ref{['fig:visualization:a']}) Illustration of the projection mapping trained on two collections of samples generated from two different target distributions with $m=n=100$. Here the red and blue points are generated from Gaussian distributions with two different covariance matrix. The purple arrow denotes the optimized projection mapping. \ref{['fig:visualization:b']}) KDE plot for the empirical distributions after projection.

Theorems & Definitions (13)

  • Definition 1: Integral Probability Metric
  • Definition 2: Projected Wasserstein Distance
  • Definition 3: Rademacher complexity
  • Proposition 1: Finite-sample Guarantee for the rate of convergence of IPM
  • Proposition 2: Improved rate of convergence of IPM
  • Corollary 1
  • Example 1: Wasserstein Distance for Two-sample Tests
  • Proposition 3
  • Theorem 1
  • Remark 1
  • ...and 3 more