Table of Contents
Fetching ...

Structure preservation via the Wasserstein distance

Daniel Bartl, Shahar Mendelson

TL;DR

The paper addresses how high-dimensional random samples preserve structure by analyzing the coordinate-wise behavior of marginals along all directions $\theta$ through the max-sliced Wasserstein distance $\mathcal{SW}_2$. It shows that with $m$ i.i.d. copies of $X$, the empirical one-dimensional marginals closely track the true marginals with a bound $\mathcal{SW}_2(\mu_m,\mu) \le c\left(\frac{d}{m}\right)^{1/4}$, and this rate is optimal in general. The approach hinges on a sharp one-dimensional Wasserstein control via inverse distribution functions and a scale-sensitive, multivariate DKW-type inequality for linear functionals, complemented by a global modulus of continuity for $F_{\mu^\theta}^{-1}$ and deterministic perturbation arguments. In special cases such as log-concave or subgaussian distributions, the rate improves to $\sqrt{d/m}$ up to logarithmic factors, and the paper establishes corresponding lower bounds and explores first-order variants like $\mathcal{SW}_1$. Overall, the results illuminate how high-dimensional sampling preserves marginal structure and quantify the precise rates at which empirical marginals converge to their true counterparts across all directions.

Abstract

We show that under minimal assumptions on a random vector $X\in\mathbb{R}^d$ and with high probability, given $m$ independent copies of $X$, the coordinate distribution of each vector $(\langle X_i,θ\rangle)_{i=1}^m$ is dictated by the distribution of the true marginal $\langle X,θ\rangle$. Specifically, we show that with high probability, \[\sup_{θ\in S^{d-1}} \left( \frac{1}{m}\sum_{i=1}^m \left|\langle X_i,θ\rangle^\sharp - λ^θ_i \right|^2 \right)^{1/2} \leq c \left( \frac{d}{m} \right)^{1/4},\] where $λ^θ_i = m\int_{(\frac{i-1}{m}, \frac{i}{m}]} F_{ \langle X,θ\rangle }^{-1}(u)\,du$ and $a^\sharp$ denotes the monotone non-decreasing rearrangement of $a$. Moreover, this estimate is optimal. The proof follows from a sharp estimate on the worst Wasserstein distance between a marginal of $X$ and its empirical counterpart, $\frac{1}{m} \sum_{i=1}^m δ_{\langle X_i, θ\rangle}$.

Structure preservation via the Wasserstein distance

TL;DR

The paper addresses how high-dimensional random samples preserve structure by analyzing the coordinate-wise behavior of marginals along all directions through the max-sliced Wasserstein distance . It shows that with i.i.d. copies of , the empirical one-dimensional marginals closely track the true marginals with a bound , and this rate is optimal in general. The approach hinges on a sharp one-dimensional Wasserstein control via inverse distribution functions and a scale-sensitive, multivariate DKW-type inequality for linear functionals, complemented by a global modulus of continuity for and deterministic perturbation arguments. In special cases such as log-concave or subgaussian distributions, the rate improves to up to logarithmic factors, and the paper establishes corresponding lower bounds and explores first-order variants like . Overall, the results illuminate how high-dimensional sampling preserves marginal structure and quantify the precise rates at which empirical marginals converge to their true counterparts across all directions.

Abstract

We show that under minimal assumptions on a random vector and with high probability, given independent copies of , the coordinate distribution of each vector is dictated by the distribution of the true marginal . Specifically, we show that with high probability, where and denotes the monotone non-decreasing rearrangement of . Moreover, this estimate is optimal. The proof follows from a sharp estimate on the worst Wasserstein distance between a marginal of and its empirical counterpart, .
Paper Structure (13 sections, 26 theorems, 158 equations)

This paper contains 13 sections, 26 theorems, 158 equations.

Key Result

Theorem 1.4

Let $X$ be centred and isotropic, and assume that $\sup_{\theta \in S^{d-1}} \|\left\langle X,\theta \right\rangle\|_{L_q} \leq L$ for some $q\geq 4$. Then there are absolute constants $c_0,c_1,c_2$ and a constant $c_3$ that depends only on $q$ and $L$ such that the following holds. Let $0<\Delta\le

Theorems & Definitions (59)

  • Definition 1.2
  • Remark 1.3
  • Theorem 1.4
  • Remark 1.5
  • Remark 1.6
  • Theorem 1.7
  • Remark 1.8
  • Theorem 1.9
  • Lemma 2.1
  • Lemma 2.2
  • ...and 49 more