Manifold learning in Wasserstein space

Keaton Hamm; Caroline Moosmüller; Bernhard Schmitzer; Matthew Thorpe

Manifold learning in Wasserstein space

Keaton Hamm, Caroline Moosmüller, Bernhard Schmitzer, Matthew Thorpe

TL;DR

This work develops a theoretical framework for manifold learning within the 2-Wasserstein space of absolutely continuous measures by constructing finite-dimensional submanifolds $\Lambda$ via a latent manifold $\mathcal{S}$ and a deformation map. It introduces the geodesic-restricted metric $W_\Lambda$, derives local linearization results that relate Wasserstein distances to tangent-vector norms using a velocity-field map $B$, and analyzes how the latent structure can be learned from samples through Gromov--Wasserstein convergence and spectral tangent-space recovery. The paper provides constructive schemes (diffeomorphic template deformations and gradient projections), proves consistency results for graph-based approximations, and offers numerical illustrations demonstrating tangent-space recovery and the impact of discretization. These contributions lay a foundation for manifold learning in Wasserstein space and suggest directions for exploiting Riemannian structure and diffusion-inspired embeddings in this non-Euclidean setting.

Abstract

This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ with $Ω$ a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $\mathbb{W}$. We begin by introducing a construction of submanifolds $Λ$ in $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ equipped with metric $\mathbb{W}_Λ$, the geodesic restriction of $\mathbb{W}$ to $Λ$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(Λ,\mathbb{W}_Λ)$ can be learned from samples $\{λ_i\}_{i=1}^N$ of $Λ$ and pairwise extrinsic Wasserstein distances $\mathbb{W}$ on $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ only. In particular, we show that the metric space $(Λ,\mathbb{W}_Λ)$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{λ_i\}_{i=1}^N$ and edge weights $W(λ_i,λ_j)$. In addition, we demonstrate how the tangent space at a sample $λ$ can be asymptotically recovered via spectral analysis of a suitable ``covariance operator'' using optimal transport maps from $λ$ to sufficiently close and diverse samples $\{λ_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $Λ$ and numerical examples on the recovery of tangent spaces through spectral analysis.

Manifold learning in Wasserstein space

TL;DR

This work develops a theoretical framework for manifold learning within the 2-Wasserstein space of absolutely continuous measures by constructing finite-dimensional submanifolds

via a latent manifold

and a deformation map. It introduces the geodesic-restricted metric

, derives local linearization results that relate Wasserstein distances to tangent-vector norms using a velocity-field map

, and analyzes how the latent structure can be learned from samples through Gromov--Wasserstein convergence and spectral tangent-space recovery. The paper provides constructive schemes (diffeomorphic template deformations and gradient projections), proves consistency results for graph-based approximations, and offers numerical illustrations demonstrating tangent-space recovery and the impact of discretization. These contributions lay a foundation for manifold learning in Wasserstein space and suggest directions for exploiting Riemannian structure and diffusion-inspired embeddings in this non-Euclidean setting.

Abstract

This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures

with

a compact and convex subset of

, metrized with the Wasserstein-2 distance

. We begin by introducing a construction of submanifolds

equipped with metric

, the geodesic restriction of

. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of

. We then show how the latent manifold structure of

can be learned from samples

and pairwise extrinsic Wasserstein distances

only. In particular, we show that the metric space

can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes

and edge weights

. In addition, we demonstrate how the tangent space at a sample

can be asymptotically recovered via spectral analysis of a suitable ``covariance operator'' using optimal transport maps from

to sufficiently close and diverse samples

. The paper closes with some explicit constructions of submanifolds

and numerical examples on the recovery of tangent spaces through spectral analysis.

Paper Structure (33 sections, 19 theorems, 173 equations, 7 figures)

This paper contains 33 sections, 19 theorems, 173 equations, 7 figures.

Introduction
Motivation
The Riemannian structure of the Wasserstein-2 distance
Linearized optimal transport
Tangent space approximation quality
Manifold Learning
Outline and contribution
Notation and setting
Submanifolds in Wasserstein space
Parametrized submanifolds
Constructing submanifolds by diffeomorphic deformation of a template
Geodesic restriction of $\mathrm{W}$ to $\Lambda$ and pull-back of Riemannian tensor
Local linearization
Local linearization in the ambient space $\mathrm{W}$
Sampling and graph approximation
...and 18 more sections

Key Result

Proposition 1.1

For a curve $\rho \in \mathop{\mathrm{Lip}}\nolimits([0,1];(\mathcal{P}(\Omega),\mathrm{W}))$ let be the corresponding set of velocity fields that satisfy the continuity equation with $\rho$ in a distributional sense and set The map $\mathop{\mathrm{Lip}}\nolimits([0,1];(\mathcal{P}(\Omega),\mathrm{W})) \ni \rho \mapsto \mathrm{energy}(\rho)$ is lower-semicontinuous (with respect to uniform conv

Figures (7)

Figure 1: Description of the pieces of the proof of Gromov--Wasserstein convergence. (Top left) Partitions of $\Lambda$ that get mapped to discrete points in $\Lambda^N$ via $\hat{S}_N$. (Top Right) Samples $\lambda_i$ of $\Lambda^N$. (Bottom left) Continuous path from $\mu_0$ to $\mu_1$ in $\Lambda$. (Bottom right) Piecewise chordal path in $\Lambda^N$ from $\nu_0$ to $\nu_1$ passing through the samples $\lambda_i$.
Figure 1: Example for one-dimensional base space. Left: Lebesgue densities $l_{j}$ for some sample measures $(\lambda_{j})_j$ from the manifold $\Lambda$. Middle: For fixed $\theta \in \mathcal{S}$ and various tangent vectors $(\eta_i)_i$ in $T_\theta \mathcal{S}$ the difference between between $v_i := B_{\theta} \eta_i$ and $w_i^t := (T_i^t-\mathop{\mathrm{id}}\nolimits)/t$ in $\mathrm{L}^2(\lambda_\theta)$ is shown over $t$, where $T_i^t$ is the optimal transport map from $\lambda_\theta=E(\theta)$ to $E(\theta + t \cdot \eta_i)$. As expected, in this case we observe linear scaling, as indicated by the black line. Right: Leading eigenvalues of the span operator $F(W^t)$ (\ref{['thm:RecoverTanSpace']}) over $t$. As $t \to 0$ we recover three non-zero eigenvalues, corresponding to the dimension of $\mathcal{S}$. Residual eigenvalues decay as $O(t^2)$. See text for full details.
Figure 2: Example for one-dimensional parameter space. At $\theta=0$, $\lambda_0=\lambda$ is the uniform probability measure on $[-1,1] \times [-0.5,0.5]$, discretized by a uniform Cartesian grid, shown in grey in the left panel. For several other $\theta$ the deformed point cloud is shown. Note that the point density is no longer uniform for $\theta \neq 0$, which is not explicitly encoded in the figure. Black arrows show a (subsampling) of the velocity field $v(\theta,\cdot)=\nabla_x \phi(\theta,\cdot)$. The red crosses indicate the positions $z_1(\theta)$, $z_2(\theta)$ used for the parametrization of the velocity field, \ref{['eq:NumOneDimParamZ']}. See text for full details.
Figure 3: Example for one-dimensional parameter space. Left: Different notions of pairwise distances on the manifold of Figure \ref{['fig:OneDimParamDeformation']}. Shown are $\mathrm{W}_\Lambda(\lambda_0,\lambda_\theta)$, $\mathrm{W}(\lambda_0,\lambda_\theta)$, and $\|\psi(\theta,\cdot)-\mathop{\mathrm{id}}\nolimits\|_{\mathrm{L}^2(\lambda_0)}$ for various $\theta$. Right: Difference between $v_i := B_{\theta} \eta_i$ and $w_i^t := (T_i^t-\mathop{\mathrm{id}}\nolimits)/t$ in $\mathrm{L}^2(\lambda_0)$ over $t$ for various tangent vectors $\eta_i$, similar to Figure \ref{['fig:OneDimBase']}, middle. Again we observe linear scaling, as indicated by the black line. All results shown here are based on a discretization of $\lambda$ with $200 \times 100$ points.
Figure 4: Example for one-dimensional parameter space. Spectrum of the span operator $F(W^t)$ for different $t$ and different grid resolutions ($50 \times 25$, $100 \times 50$, $200 \times 100$). For comparison the spectrum of the operator $F(Z^t)$ is shown where $Z^t$ is constructed from the finite differences $\psi(t \cdot \eta_i,\cdot)-\mathop{\mathrm{id}}\nolimits$. For visual clarity only the four largest eigenvalues are shown. For small $t$ the spectra of $F(W^t)$ and $F(Z^t)$ agree, at some points the precise transition depends on the grid discretization scale. Non-dominant eigenvalues decay like $O(t^2)$. See text for full details.
...and 2 more figures

Theorems & Definitions (50)

Proposition 1.1: Energy functional and Benamou--Brenier formulation
Remark 2.2: On the regularity assumptions
Remark 2.3: On the choice of the parameter manifold $\mathcal{S}$
Remark 2.4: Relation of \ref{['eq:VelTimeSmooth']} to geodesic equation for $\mathrm{W}$
Lemma 2.5: Well-posedness and regularity of flows
Proof 1
Example 2.6: Translations
Example 2.7: Dilation
Proposition 2.8: Geodesic restriction of $\mathrm{W}$ to $\Lambda$
Proof 2
...and 40 more

Manifold learning in Wasserstein space

TL;DR

Abstract

Manifold learning in Wasserstein space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (50)