Table of Contents
Fetching ...

Persistent Homology for High-dimensional Data Based on Spectral Methods

Sebastian Damrich, Philipp Berens, Dmitry Kobak

TL;DR

It is found that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise.

Abstract

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.

Persistent Homology for High-dimensional Data Based on Spectral Methods

TL;DR

It is found that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise.

Abstract

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.
Paper Structure (61 sections, 16 theorems, 68 equations, 44 figures, 5 tables)

This paper contains 61 sections, 16 theorems, 68 equations, 44 figures, 5 tables.

Key Result

Proposition 6.1

The corrected effective resistance distance can be computed by

Figures (44)

  • Figure 1: a. 2D PCA of a noisy circle ($\sigma=0.25$, radius 1) in $\mathbb R^{50}$. Overlaid are representative cycles of the most persistent loops. b. Persistence diagrams using Euclidean distance and the effective resistance. c. Loop detection scores of persistent homology using effective resistance and Euclidean distance. d, e. UMAP and $t$-SNE embeddings of the same data, showing the loop structure in 2D.
  • Figure 2: a. Persistent homology applied to a noisy circle ($n=10$) in 2D tracks appearing and disappearing holes as balls grow around each datapoint. Dotted lines show the graph edges that lead to the birth / death of two loops (Section \ref{['sec:PH']}). b. The corresponding persistence diagram with two detected 1D holes (loops). Our hole detection score measures the gap in persistence between the first and the second detected holes (Section \ref{['para:performance_metric']}).
  • Figure 3: a -- c. Persistence diagrams of a noisy circle in different ambient dimensionality and with different amount of noise. Ideally, there should be one feature (point) with high persistence, corresponding to the circle. But for high noise and dimensionality that feature vanishes into the noise cloud near the diagonal. d -- f. Multidimensional scaling of Euclidean, effective resistance, and diffusion distances for a noisy circle in $\mathbb R^{50}$. Color indicates the distance to the highlighted point.
  • Figure 4: Robustness of effective resistance. We sampled $n=1\,000$ points from a noisy circle in 2D with Gaussian noise of standard deviation $\sigma=0.1$, constructed the unweighted symmetric $15$-NN graph, and optionally added 10 random edges (thick lines). Node colors indicate the graph distance from the fat black dot. a. The geodesic distance is severely affected by the random edges. b. The effective resistance distance is robust to them.
  • Figure 5: a. Eigenvalue spectra of the $k$NN graph Laplacian for the noisy circle in ambient $\mathbb{R}^{50}$ for noise levels $\sigma=\{0.0, 0.1, 0.25\}$. b. Decay of eigenvector contribution based on the eigenvalue for effective resistance, diffusion distances and DPT. c -- e. Relative contribution of each eigenvector for eff. resistance, diffusion distance, Laplacian Eigenmaps, and DPT for various noise levels (Section \ref{['sec:spectral']}).
  • ...and 39 more figures

Theorems & Definitions (34)

  • Proposition 6.1
  • Corollary 6.2
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Lemma B.3
  • proof
  • Proposition B.4
  • proof
  • ...and 24 more