Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds
Charles Arnal, David Cohen-Steiner, Vincent Divol
TL;DR
This work analyzes the stability and convergence of Čech persistence diagrams under the Wasserstein-type metric $\mathrm{OT}_p$ for data sampled on an $m$-dimensional submanifold of ${\mathbb R}^d$. It proves that $\mathrm{OT}_p$ convergence occurs exactly when $p>m$, and provides a quadratic improvement of Bottleneck stability under positive reach, together with laws of large numbers for the total $\alpha$-persistence. The results cover deterministic, generic, and random sampling settings, yielding explicit bounds on $\mathrm{OT}_p(\mathrm{dgm}_i({\mathsf A}),\mathrm{dgm}_i({\mathsf M}))$ and asymptotics for $\mathrm{Pers}_\alpha(\mathrm{dgm}_i({\mathsf A}_n))$, with sharp dependence on the intrinsic dimension $m$ rather than the ambient dimension $d$. These findings also imply regularity for ML-oriented feature maps on the PD space and are validated by numerical experiments, highlighting practical implications for topological data analysis in high-dimensional data modeled by manifolds.
Abstract
Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifold of $\mathbb{R}^{d}$. Under this manifold hypothesis, we show that convergence with respect to the $OT_{p}$ metric happens exactly when $p\gt m$. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total $α$-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.
