Table of Contents
Fetching ...

Minimax-Optimal Two-Sample Test with Sliced Wasserstein

Binh Thuan Tran, Nicolas Schreuder

TL;DR

This work develops a new nonparametric two-sample test based on the sliced Wasserstein distance. By using a permutation framework, it achieves finite-sample Type I error control and provides non-asymptotic power guarantees, showing minimax optimality with separation rate $n^{-1/2}$ over multinomial and bounded-support alternatives. The method leverages random projections to reduce dimensionality and analyzes the trade-off between the number of projections and statistical power, while maintaining scalability through efficient computation and parallelism. Empirical results on synthetic data and MNIST demonstrate robust performance without kernel tuning, highlighting the method’s practical utility for geometry-aware distribution testing in high dimensions.

Abstract

We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited. We address this gap by proposing a permutation-based SW test and analyzing its performance. The test inherits finite-sample Type I error control from the permutation principle. Moreover, we establish non-asymptotic power bounds and show that the procedure achieves the minimax separation rate $n^{-1/2}$ over multinomial and bounded-support alternatives, matching the optimal guarantees of kernel-based tests while building on the geometric foundations of Wasserstein distances. Our analysis further quantifies the trade-off between the number of projections and statistical power. Finally, numerical experiments demonstrate that the test combines finite-sample validity with competitive power and scalability, and -- unlike kernel-based tests, which require careful kernel tuning -- it performs consistently well across all scenarios we consider.

Minimax-Optimal Two-Sample Test with Sliced Wasserstein

TL;DR

This work develops a new nonparametric two-sample test based on the sliced Wasserstein distance. By using a permutation framework, it achieves finite-sample Type I error control and provides non-asymptotic power guarantees, showing minimax optimality with separation rate over multinomial and bounded-support alternatives. The method leverages random projections to reduce dimensionality and analyzes the trade-off between the number of projections and statistical power, while maintaining scalability through efficient computation and parallelism. Empirical results on synthetic data and MNIST demonstrate robust performance without kernel tuning, highlighting the method’s practical utility for geometry-aware distribution testing in high dimensions.

Abstract

We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited. We address this gap by proposing a permutation-based SW test and analyzing its performance. The test inherits finite-sample Type I error control from the permutation principle. Moreover, we establish non-asymptotic power bounds and show that the procedure achieves the minimax separation rate over multinomial and bounded-support alternatives, matching the optimal guarantees of kernel-based tests while building on the geometric foundations of Wasserstein distances. Our analysis further quantifies the trade-off between the number of projections and statistical power. Finally, numerical experiments demonstrate that the test combines finite-sample validity with competitive power and scalability, and -- unlike kernel-based tests, which require careful kernel tuning -- it performs consistently well across all scenarios we consider.

Paper Structure

This paper contains 48 sections, 24 theorems, 176 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 4

The test $\Delta$ defined in Algorithm algorithm has non-asymptotic level $\alpha$ for any $\alpha \in (0,1)$. That is where the probability is taken over the samples, projection directions, and permutation randomness.

Figures (6)

  • Figure 1: Power across three scenarios: Gaussian covariance shift, ball vs. sphere, and MNIST mixture.
  • Figure 2: SW Test power vs. number of projections (fixed sample sizes $n=m=140$)
  • Figure 3: Computation time (log scale) of SW tests
  • Figure 4: Computation time for ball vs. sphere
  • Figure 5: Histograms of the test statistic $\widehat{\operatorname{SW}}^{\,2}_{2}$ computed from $8000$ i.i.d. samples drawn respectively from (left) Gaussian $\mathcal{N}(0,I_2)$, (middle) Uniform on $[-1,1]^2$, and (right) a two-component Gaussian mixture $\tfrac{1}{2}\mathcal{N}(0,I_2)+\tfrac{1}{2}\mathcal{N}(\mathbf{m},I_2)$, where $\mathbf{m}=(2.5,2.5)^\top$.
  • ...and 1 more figures

Theorems & Definitions (31)

  • Definition 1: SW distance
  • Remark 2
  • Remark 3
  • Theorem 4: Type I error Control
  • Theorem 5: Power Control
  • Proposition 6
  • Proposition 7
  • Lemma 8: McDiarmid's Inequality
  • Lemma 9: Hoeffding's Inequality
  • Definition 10: $(n,m)$-symmetric function
  • ...and 21 more