Table of Contents
Fetching ...

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

Charles Arnal, David Cohen-Steiner, Vincent Divol

TL;DR

This work analyzes the stability and convergence of Čech persistence diagrams under the Wasserstein-type metric $\mathrm{OT}_p$ for data sampled on an $m$-dimensional submanifold of ${\mathbb R}^d$. It proves that $\mathrm{OT}_p$ convergence occurs exactly when $p>m$, and provides a quadratic improvement of Bottleneck stability under positive reach, together with laws of large numbers for the total $\alpha$-persistence. The results cover deterministic, generic, and random sampling settings, yielding explicit bounds on $\mathrm{OT}_p(\mathrm{dgm}_i({\mathsf A}),\mathrm{dgm}_i({\mathsf M}))$ and asymptotics for $\mathrm{Pers}_\alpha(\mathrm{dgm}_i({\mathsf A}_n))$, with sharp dependence on the intrinsic dimension $m$ rather than the ambient dimension $d$. These findings also imply regularity for ML-oriented feature maps on the PD space and are validated by numerical experiments, highlighting practical implications for topological data analysis in high-dimensional data modeled by manifolds.

Abstract

Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifold of $\mathbb{R}^{d}$. Under this manifold hypothesis, we show that convergence with respect to the $OT_{p}$ metric happens exactly when $p\gt m$. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total $α$-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

TL;DR

This work analyzes the stability and convergence of Čech persistence diagrams under the Wasserstein-type metric for data sampled on an -dimensional submanifold of . It proves that convergence occurs exactly when , and provides a quadratic improvement of Bottleneck stability under positive reach, together with laws of large numbers for the total -persistence. The results cover deterministic, generic, and random sampling settings, yielding explicit bounds on and asymptotics for , with sharp dependence on the intrinsic dimension rather than the ambient dimension . These findings also imply regularity for ML-oriented feature maps on the PD space and are validated by numerical experiments, highlighting practical implications for topological data analysis in high-dimensional data modeled by manifolds.

Abstract

Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances ; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an -dimensional submanifold of . Under this manifold hypothesis, we show that convergence with respect to the metric happens exactly when . We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total -persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.
Paper Structure (12 sections, 34 theorems, 99 equations, 6 figures)

This paper contains 12 sections, 34 theorems, 99 equations, 6 figures.

Key Result

Lemma 2.1

If $0<a<b$ are such that $d_{\mathsf A}^{-1}[a,b]$ contains no differential critical point of $d_{\mathsf A}$, then $d_{\mathsf A}^{-1}(-\infty,a]$ is a deformation retract of $d_{\mathsf A}^{-1}(-\infty,b]$. Consequently, any $(u_1,u_2) \in \mathrm{dgm}_i({\mathsf A})$ is such that $u_1,u_2 \not \i

Figures (6)

  • Figure 1: The Čech PD of a point cloud ${\mathsf A}$ in ${\mathbb R}^2$ for $i=1$ and its $t$-offsets. The two points far from the diagonal $\partial \Omega$ in $\mathrm{dgm}_i({\mathsf A})$ correspond to the two large cycles in the set ${\mathsf A}$.
  • Figure 2: PDs of ${\mathsf M}$ (red) and of ${\mathsf A}$ (black).
  • Figure 3: A generic torus.
  • Figure 4: Left: the Čech PD $\mathrm{dgm}_1({\mathsf A}_n)$ of a sample of $n=10^4$ points sampled on a generic torus, with points in Regions (1), (2) and (3) highlighted in different colors. Right: the persistence images of $\mathrm{dgm}_1({\mathsf A}_n)$ with weight $\mathrm{pers}^p$ for different values of $p$.
  • Figure 5: Plot in log-log scale of $\mathrm{Pers}_p(\mathrm{dgm}_i^{(1)}({\mathsf A}_n))$ as a function of $n$ for points sampled on a circle, $i=0$ (left), points sampled on a torus, $i=0$ (center), points sampled on a torus, $i=1$ (right). Dashed lines have slopes equal to $1-p/m$.
  • ...and 1 more figures

Theorems & Definitions (63)

  • Lemma 2.1: Isotopy Lemma for Distance Functions
  • Lemma 2.2
  • proof
  • Theorem 2.3: Improved Bottleneck Stability Theorem
  • proof
  • Example 3.1
  • Definition 3.2: Topological Morse functions morse1959topologically
  • Lemma 3.3: Isotopy Lemma
  • Lemma 3.4: Handle Attachment Lemma
  • Theorem 3.5: Genericity Theorem arnal2023critical
  • ...and 53 more