Table of Contents
Fetching ...

Wasserstein Spatial Depth

François Bachoc, Alberto González-Sanz, Jean-Michel Loubes, Yisha Yao

TL;DR

We address the problem of ranking distribution-valued data in non-Euclidean Wasserstein spaces by introducing Wasserstein spatial depth (WSD). WSD leverages optimal transport maps $T_{Q,P}$ and the Wasserstein distance $\mathcal{W}_2$ to define a depth $SD(Q;{\bf P})=1-\left\| \mathbb{E}_{P\sim{\bf P}}\left[ (\mathbf{x}-T_{Q,P}(\mathbf{x}))/\mathcal{W}_2(P,Q) \right]\right\|_{L^2(Q)}$, preserving key depth properties and enabling consistent plug-in estimation under one- and two-stage sampling. The paper provides explicit forms in univariate, location-family, and Gaussian settings, proves core properties (range, invariance, vanishing at infinity, maximality at center, continuity), and develops rigorous consistency and asymptotic normality results for the empirical WSD. Through simulations and a real-data climate application, WSD demonstrates superior ordering, robust outlier detection, and tangible advantages over embedding-based or other non-Wasserstein depths. The work thus enables reliable, geometry-aware depth-based analysis for distribution-valued data and paves the way for broader Wasserstein-space statistical tools.

Abstract

Modeling observations as random distributions embedded within Wasserstein spaces is becoming increasingly popular across scientific fields, as it captures the variability and geometric structure of the data more effectively. However, the distinct geometry and unique properties of Wasserstein space pose challenges to the application of conventional statistical tools, which are primarily designed for Euclidean spaces. Consequently, adapting and developing new methodologies for analysis within Wasserstein spaces has become essential. The space of distributions on $\mathbb{R}^d$ with $d>1$ is not linear, and "mimic" the geometry of a Riemannian manifold. In this paper, we extend the concept of statistical depth to distribution-valued data, introducing the notion of Wasserstein spatial depth. This new measure provides a way to rank and order distributions, enabling the development of order-based clustering techniques and inferential tools. We show that Wasserstein spatial depth (WSD) preserves critical properties of conventional statistical depths, notably, ranging within $[0,1]$, transformation invariance, vanishing at infinity, reaching a maximum at the geometric median, and continuity. Additionally, the population WSD has a straightforward plug-in estimator based on sampled empirical distributions. We establish the estimator's consistency and asymptotic normality. Extensive simulation and real-data application showcase the practical efficacy of WSD.

Wasserstein Spatial Depth

TL;DR

We address the problem of ranking distribution-valued data in non-Euclidean Wasserstein spaces by introducing Wasserstein spatial depth (WSD). WSD leverages optimal transport maps and the Wasserstein distance to define a depth , preserving key depth properties and enabling consistent plug-in estimation under one- and two-stage sampling. The paper provides explicit forms in univariate, location-family, and Gaussian settings, proves core properties (range, invariance, vanishing at infinity, maximality at center, continuity), and develops rigorous consistency and asymptotic normality results for the empirical WSD. Through simulations and a real-data climate application, WSD demonstrates superior ordering, robust outlier detection, and tangible advantages over embedding-based or other non-Wasserstein depths. The work thus enables reliable, geometry-aware depth-based analysis for distribution-valued data and paves the way for broader Wasserstein-space statistical tools.

Abstract

Modeling observations as random distributions embedded within Wasserstein spaces is becoming increasingly popular across scientific fields, as it captures the variability and geometric structure of the data more effectively. However, the distinct geometry and unique properties of Wasserstein space pose challenges to the application of conventional statistical tools, which are primarily designed for Euclidean spaces. Consequently, adapting and developing new methodologies for analysis within Wasserstein spaces has become essential. The space of distributions on with is not linear, and "mimic" the geometry of a Riemannian manifold. In this paper, we extend the concept of statistical depth to distribution-valued data, introducing the notion of Wasserstein spatial depth. This new measure provides a way to rank and order distributions, enabling the development of order-based clustering techniques and inferential tools. We show that Wasserstein spatial depth (WSD) preserves critical properties of conventional statistical depths, notably, ranging within , transformation invariance, vanishing at infinity, reaching a maximum at the geometric median, and continuity. Additionally, the population WSD has a straightforward plug-in estimator based on sampled empirical distributions. We establish the estimator's consistency and asymptotic normality. Extensive simulation and real-data application showcase the practical efficacy of WSD.

Paper Structure

This paper contains 37 sections, 8 theorems, 150 equations, 7 figures.

Key Result

Theorem 5.1

Set $\mathbf{P}\in \mathcal{P}(\mathcal{P}_2(\mathbb{R}^d))$. Then the following properties hold:

Figures (7)

  • Figure 1: The green solid lines depict the change of theoretical WSD along the parameter indexing ${\bf P}$. The black circles represent the distribution of empirical WSDs, with error bars indicating one standard deviation above and below the mean.
  • Figure 2: The relationships between the WSD and conventional spatial depth in the four cases of Section \ref{['subsection:WSd:vs:conv']}.
  • Figure 3: Left panel: the distributions are drawn according to Case 1. Right panel: the distributions are drawn according to Case 2. The green dots represent regular distributions from the population $\mathbf{P}$, and the orange dots represent the outlier distributions.
  • Figure 4: (a): The data points are drawn from the distributions of Case 1. The green triangles represent data points from the regular distributions, while orange triangles represent data points from the exotic distributions. (b): The green dots represent the WSD values of the regular distributions, while the orange dots represent the WSD values of the exotic distributions. (c): Each dot represents the MBD of a distribution. The coloring pattern is the same as before. (d): Each dot represents the FSD of a distribution. The coloring pattern remains the same.
  • Figure 5: (a): The data points are drawn from the distributions of Case 2. The green triangles represent data points from the regular distributions, while orange triangles represent data points from the exotic distributions. (b): The green dots represent the WSD values of the regular distributions, while the orange dots represent the WSD values of the exotic distributions. (c): Each dot represents the MBD of a distribution. The coloring pattern is the same as before. (d): Each dot represents the FSD of a distribution. The coloring pattern remains the same.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Definition 3.1
  • Theorem 5.1
  • Theorem 5.2
  • Remark 5.3
  • Theorem 5.4
  • Lemma 5.5: Continuity of $T^Q$
  • Theorem 5.6
  • Theorem 6.1
  • Lemma 6.2
  • Theorem 6.3
  • ...and 9 more