Table of Contents
Fetching ...

MaxTDA: Robust Statistical Inference for Maximal Persistence in Topological Data Analysis

Sixtus Dakurah, Jessi Cisewski-Kehe

TL;DR

MaxTDA tackles the underestimation of maximal persistence under robust TDA by marrying KDE-based smoothing with level-set thresholding and rejection sampling to create dense, topology-preserving samples. This yields consistent estimators for the maximal persistence and permits statistical inference through bootstrap-based rejection bands. The authors prove consistency of the subsampling scheme and stability of the maximal-persistence estimator, and they develop a Monte Carlo procedure to quantify significance. Numerical experiments, including exoplanet time-series analyses, demonstrate improved recovery of true topological signals and meaningful statistical conclusions. The framework offers a principled, practical approach for reliable topological signal quantification in noisy, density-varied data.

Abstract

Persistent homology is an area within topological data analysis (TDA) that can uncover different dimensional holes (connected components, loops, voids, etc.) in data. The holes are characterized, in part, by how long they persist across different scales. Noisy data can result in many additional holes that are not true topological signal. Various robust TDA techniques have been proposed to reduce the number of noisy holes, however, these robust methods have a tendency to also reduce the topological signal. This work introduces Maximal TDA (MaxTDA), a statistical framework addressing a limitation in TDA wherein robust inference techniques systematically underestimate the persistence of significant homological features. MaxTDA combines kernel density estimation with level-set thresholding via rejection sampling to generate consistent estimators for the maximal persistence features that minimizes bias while maintaining robustness to noise and outliers. We establish the consistency of the sampling procedure and the stability of the maximal persistence estimator. The framework also enables statistical inference on topological features through rejection bands, constructed from quantiles that bound the estimator's deviation probability. MaxTDA is particularly valuable in applications where precise quantification of statistically significant topological features is essential for revealing underlying structural properties in complex datasets. Numerical simulations across varied datasets, including an example from exoplanet astronomy, highlight the effectiveness of MaxTDA in recovering true topological signals.

MaxTDA: Robust Statistical Inference for Maximal Persistence in Topological Data Analysis

TL;DR

MaxTDA tackles the underestimation of maximal persistence under robust TDA by marrying KDE-based smoothing with level-set thresholding and rejection sampling to create dense, topology-preserving samples. This yields consistent estimators for the maximal persistence and permits statistical inference through bootstrap-based rejection bands. The authors prove consistency of the subsampling scheme and stability of the maximal-persistence estimator, and they develop a Monte Carlo procedure to quantify significance. Numerical experiments, including exoplanet time-series analyses, demonstrate improved recovery of true topological signals and meaningful statistical conclusions. The framework offers a principled, practical approach for reliable topological signal quantification in noisy, density-varied data.

Abstract

Persistent homology is an area within topological data analysis (TDA) that can uncover different dimensional holes (connected components, loops, voids, etc.) in data. The holes are characterized, in part, by how long they persist across different scales. Noisy data can result in many additional holes that are not true topological signal. Various robust TDA techniques have been proposed to reduce the number of noisy holes, however, these robust methods have a tendency to also reduce the topological signal. This work introduces Maximal TDA (MaxTDA), a statistical framework addressing a limitation in TDA wherein robust inference techniques systematically underestimate the persistence of significant homological features. MaxTDA combines kernel density estimation with level-set thresholding via rejection sampling to generate consistent estimators for the maximal persistence features that minimizes bias while maintaining robustness to noise and outliers. We establish the consistency of the sampling procedure and the stability of the maximal persistence estimator. The framework also enables statistical inference on topological features through rejection bands, constructed from quantiles that bound the estimator's deviation probability. MaxTDA is particularly valuable in applications where precise quantification of statistically significant topological features is essential for revealing underlying structural properties in complex datasets. Numerical simulations across varied datasets, including an example from exoplanet astronomy, highlight the effectiveness of MaxTDA in recovering true topological signals.

Paper Structure

This paper contains 27 sections, 4 theorems, 21 equations, 10 figures, 1 algorithm.

Key Result

Lemma 2

Let $\Delta$ be the persistence diagram with points only along the diagonal. Let $\phi_n$ be an empirical KDE or DTM function defined on the sample $\mathbb{X}_n$. Then the following results hold: (i) The maximum persistence can be expressed in terms of the bottleneck distance: (ii) Let $\widehat{\nabla}$ be defined as $\widehat{\nabla} = \left| \text{mp}[\widehat{\text{Dgm}}] - \text{mp}[\text{D

Figures (10)

  • Figure 1: Illustration of the MaxTDA framework. For a data space (left), robust TDA methods applies a robust filter(e.g., KDE) to the data (middle). MaxTDA extends this by sampling from a thresholded KDE (right), enhancing robustness to noise and creating a denser sampling surface.
  • Figure 2: VR filtration and persistence diagram. The zero-simplices (black points, a-c) sampled randomly around a circle. Balls (cyan) of diameter $\delta=0.8$ and $\delta=1.5$ are drawn around the points in (b) and (c), respectively, resulting in one-simplices (black segments) and two-simplices (orange triangles). The persistence diagram (d) has $H_0$ (red points) and $H_1$ (blue triangles) features.
  • Figure 3: An illustration of the VR (b), DTM (c), and KDE (d) filtration on the point cloud $\mathbb{X}_n$ (a) (the blue points are signal and the black points are noise). All three methods identified one dominant $H_1$ feature in terms of persistence.
  • Figure 4: An illustration of the VR (b), DTM (c), and KDE (d) filtration on the point cloud $\mathbb{X}_n^*$ from Algorithm \ref{['alg:subsampling']}. All three methods identified one dominant and enhanced $H_1$ feature.
  • Figure 5: MaxTDA estimation results. (a) For an appropriately chosen threshold, the maximal persistence associated with the MaxTDA $\mathbb{X}_{n, \lambda}^*$ (red circles) closely approximates the ground truth ($\mathbb{X}$) maximal persistence (orange triangles). (b) The distribution of the difference in maximal persistence between the three data samples and the ground truth across 100 independent trials, demonstrating that $\mathbb{X}_{n, \lambda}^*$ (red) maximal persistence is less biased.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Lemma 2: Maximal Persistence Stability
  • proof
  • Theorem 3: Convergence of Smooth Subsamples
  • Theorem 4: Consistency
  • proof
  • Lemma 5
  • Remark 6