Wasserstein Spatial Depth
François Bachoc, Alberto González-Sanz, Jean-Michel Loubes, Yisha Yao
TL;DR
We address the problem of ranking distribution-valued data in non-Euclidean Wasserstein spaces by introducing Wasserstein spatial depth (WSD). WSD leverages optimal transport maps $T_{Q,P}$ and the Wasserstein distance $\mathcal{W}_2$ to define a depth $SD(Q;{\bf P})=1-\left\| \mathbb{E}_{P\sim{\bf P}}\left[ (\mathbf{x}-T_{Q,P}(\mathbf{x}))/\mathcal{W}_2(P,Q) \right]\right\|_{L^2(Q)}$, preserving key depth properties and enabling consistent plug-in estimation under one- and two-stage sampling. The paper provides explicit forms in univariate, location-family, and Gaussian settings, proves core properties (range, invariance, vanishing at infinity, maximality at center, continuity), and develops rigorous consistency and asymptotic normality results for the empirical WSD. Through simulations and a real-data climate application, WSD demonstrates superior ordering, robust outlier detection, and tangible advantages over embedding-based or other non-Wasserstein depths. The work thus enables reliable, geometry-aware depth-based analysis for distribution-valued data and paves the way for broader Wasserstein-space statistical tools.
Abstract
Modeling observations as random distributions embedded within Wasserstein spaces is becoming increasingly popular across scientific fields, as it captures the variability and geometric structure of the data more effectively. However, the distinct geometry and unique properties of Wasserstein space pose challenges to the application of conventional statistical tools, which are primarily designed for Euclidean spaces. Consequently, adapting and developing new methodologies for analysis within Wasserstein spaces has become essential. The space of distributions on $\mathbb{R}^d$ with $d>1$ is not linear, and "mimic" the geometry of a Riemannian manifold. In this paper, we extend the concept of statistical depth to distribution-valued data, introducing the notion of Wasserstein spatial depth. This new measure provides a way to rank and order distributions, enabling the development of order-based clustering techniques and inferential tools. We show that Wasserstein spatial depth (WSD) preserves critical properties of conventional statistical depths, notably, ranging within $[0,1]$, transformation invariance, vanishing at infinity, reaching a maximum at the geometric median, and continuity. Additionally, the population WSD has a straightforward plug-in estimator based on sampled empirical distributions. We establish the estimator's consistency and asymptotic normality. Extensive simulation and real-data application showcase the practical efficacy of WSD.
