Manifold learning in Wasserstein space
Keaton Hamm, Caroline Moosmüller, Bernhard Schmitzer, Matthew Thorpe
TL;DR
This work develops a theoretical framework for manifold learning within the 2-Wasserstein space of absolutely continuous measures by constructing finite-dimensional submanifolds $\Lambda$ via a latent manifold $\mathcal{S}$ and a deformation map. It introduces the geodesic-restricted metric $W_\Lambda$, derives local linearization results that relate Wasserstein distances to tangent-vector norms using a velocity-field map $B$, and analyzes how the latent structure can be learned from samples through Gromov--Wasserstein convergence and spectral tangent-space recovery. The paper provides constructive schemes (diffeomorphic template deformations and gradient projections), proves consistency results for graph-based approximations, and offers numerical illustrations demonstrating tangent-space recovery and the impact of discretization. These contributions lay a foundation for manifold learning in Wasserstein space and suggest directions for exploiting Riemannian structure and diffusion-inspired embeddings in this non-Euclidean setting.
Abstract
This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ with $Ω$ a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $\mathbb{W}$. We begin by introducing a construction of submanifolds $Λ$ in $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ equipped with metric $\mathbb{W}_Λ$, the geodesic restriction of $\mathbb{W}$ to $Λ$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(Λ,\mathbb{W}_Λ)$ can be learned from samples $\{λ_i\}_{i=1}^N$ of $Λ$ and pairwise extrinsic Wasserstein distances $\mathbb{W}$ on $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ only. In particular, we show that the metric space $(Λ,\mathbb{W}_Λ)$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{λ_i\}_{i=1}^N$ and edge weights $W(λ_i,λ_j)$. In addition, we demonstrate how the tangent space at a sample $λ$ can be asymptotically recovered via spectral analysis of a suitable ``covariance operator'' using optimal transport maps from $λ$ to sufficiently close and diverse samples $\{λ_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $Λ$ and numerical examples on the recovery of tangent spaces through spectral analysis.
