Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

Charles Arnal; David Cohen-Steiner; Vincent Divol

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

Charles Arnal, David Cohen-Steiner, Vincent Divol

TL;DR

This work analyzes the stability and convergence of Čech persistence diagrams under the Wasserstein-type metric $\mathrm{OT}_p$ for data sampled on an $m$-dimensional submanifold of ${\mathbb R}^d$. It proves that $\mathrm{OT}_p$ convergence occurs exactly when $p>m$, and provides a quadratic improvement of Bottleneck stability under positive reach, together with laws of large numbers for the total $\alpha$-persistence. The results cover deterministic, generic, and random sampling settings, yielding explicit bounds on $\mathrm{OT}_p(\mathrm{dgm}_i({\mathsf A}),\mathrm{dgm}_i({\mathsf M}))$ and asymptotics for $\mathrm{Pers}_\alpha(\mathrm{dgm}_i({\mathsf A}_n))$, with sharp dependence on the intrinsic dimension $m$ rather than the ambient dimension $d$. These findings also imply regularity for ML-oriented feature maps on the PD space and are validated by numerical experiments, highlighting practical implications for topological data analysis in high-dimensional data modeled by manifolds.

Abstract

Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifold of $\mathbb{R}^{d}$. Under this manifold hypothesis, we show that convergence with respect to the $OT_{p}$ metric happens exactly when $p\gt m$. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total $α$-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

TL;DR

This work analyzes the stability and convergence of Čech persistence diagrams under the Wasserstein-type metric

for data sampled on an

-dimensional submanifold of

. It proves that

convergence occurs exactly when

, and provides a quadratic improvement of Bottleneck stability under positive reach, together with laws of large numbers for the total

-persistence. The results cover deterministic, generic, and random sampling settings, yielding explicit bounds on

and asymptotics for

, with sharp dependence on the intrinsic dimension

rather than the ambient dimension

. These findings also imply regularity for ML-oriented feature maps on the PD space and are validated by numerical experiments, highlighting practical implications for topological data analysis in high-dimensional data modeled by manifolds.

Abstract

Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances

; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an

-dimensional submanifold of

. Under this manifold hypothesis, we show that convergence with respect to the

metric happens exactly when

. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total

-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.

Paper Structure (12 sections, 34 theorems, 99 equations, 6 figures)

This paper contains 12 sections, 34 theorems, 99 equations, 6 figures.

Introduction
Čech persistence diagrams for subsets of submanifolds
Čech persistence diagrams for subsets of generic submanifolds
Random samplings of submanifolds
Region (1)
Regions (2)-(3)
Consequences for the Wasserstein convergence of persistence diagrams
Numerical experiments
Conclusion
Proofs of additional lemmas in \ref{['sec:generic']}
Proofs of additional lemmas in \ref{['sec:random']}
Proofs of \ref{['cor:Wasserstein_convergence']} and \ref{['cor:feature_maps_regularity']}

Key Result

Lemma 2.1

If $0<a<b$ are such that $d_{\mathsf A}^{-1}[a,b]$ contains no differential critical point of $d_{\mathsf A}$, then $d_{\mathsf A}^{-1}(-\infty,a]$ is a deformation retract of $d_{\mathsf A}^{-1}(-\infty,b]$. Consequently, any $(u_1,u_2) \in \mathrm{dgm}_i({\mathsf A})$ is such that $u_1,u_2 \not \i

Figures (6)

Figure 1: The Čech PD of a point cloud ${\mathsf A}$ in ${\mathbb R}^2$ for $i=1$ and its $t$-offsets. The two points far from the diagonal $\partial \Omega$ in $\mathrm{dgm}_i({\mathsf A})$ correspond to the two large cycles in the set ${\mathsf A}$.
Figure 2: PDs of ${\mathsf M}$ (red) and of ${\mathsf A}$ (black).
Figure 3: A generic torus.
Figure 4: Left: the Čech PD $\mathrm{dgm}_1({\mathsf A}_n)$ of a sample of $n=10^4$ points sampled on a generic torus, with points in Regions (1), (2) and (3) highlighted in different colors. Right: the persistence images of $\mathrm{dgm}_1({\mathsf A}_n)$ with weight $\mathrm{pers}^p$ for different values of $p$.
Figure 5: Plot in log-log scale of $\mathrm{Pers}_p(\mathrm{dgm}_i^{(1)}({\mathsf A}_n))$ as a function of $n$ for points sampled on a circle, $i=0$ (left), points sampled on a torus, $i=0$ (center), points sampled on a torus, $i=1$ (right). Dashed lines have slopes equal to $1-p/m$.
...and 1 more figures

Theorems & Definitions (63)

Lemma 2.1: Isotopy Lemma for Distance Functions
Lemma 2.2
proof
Theorem 2.3: Improved Bottleneck Stability Theorem
proof
Example 3.1
Definition 3.2: Topological Morse functions morse1959topologically
Lemma 3.3: Isotopy Lemma
Lemma 3.4: Handle Attachment Lemma
Theorem 3.5: Genericity Theorem arnal2023critical
...and 53 more

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

TL;DR

Abstract

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (63)