Table of Contents
Fetching ...

A Normalized Bottleneck Distance on Persistence Diagrams and Homology Preservation under Dimension Reduction

Nathan H. May, Bala Krishnamoorthy, Patrick Gambill

TL;DR

This work introduces the normalized bottleneck distance $d_N$ as a scale-invariant counterpart to the standard bottleneck distance for persistence diagrams, enabling robust comparison of point clouds across different scales. By developing a metric-decomposition framework, the authors prove a stability bound $d_N(X,Y) \le \frac{2\|\Delta\|}{\operatorname{diam}(Y)}$ and derive explicit preservation guarantees for Johnson–Lindenstrauss projections, metric MDS, and biLipschitz mappings in terms of $d_N$. They show that DR techniques preserve homology more effectively under $d_N$ than under $d_B$, with concrete bounds expressed through diagram distances and covariance eigenvalues, and they corroborate these results with computational experiments demonstrating improved clustering performance. The findings have practical implications for topology-aware data analysis pipelines, particularly when dimension reduction or scaling is involved, and suggest avenues for applying $d_N$ to broader nonlinear DR methods.

Abstract

Persistence diagrams (PDs) are used as signatures of point cloud data. Two clouds of points can be compared using the bottleneck distance d_B between their PDs. A potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B when there is a large scaling between them. This situation is typical in dimension reduction frameworks. We define, and study properties of, a new scale-invariant distance between PDs termed normalized bottleneck distance, d_N. In defining d_N, we develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X))) < e, where dgm(X) is the Vietoris-Rips PD of X, and pairwise distances are preserved by f up to the tolerance 0 < ε< 1. For mMDS, we present new bounds for d_B and d_N between PDs of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X). Finally, we use computational experiments to demonstrate the increased effectiveness of using the normalized bottleneck distance for clustering sets of point clouds sampled from different shapes.

A Normalized Bottleneck Distance on Persistence Diagrams and Homology Preservation under Dimension Reduction

TL;DR

This work introduces the normalized bottleneck distance as a scale-invariant counterpart to the standard bottleneck distance for persistence diagrams, enabling robust comparison of point clouds across different scales. By developing a metric-decomposition framework, the authors prove a stability bound and derive explicit preservation guarantees for Johnson–Lindenstrauss projections, metric MDS, and biLipschitz mappings in terms of . They show that DR techniques preserve homology more effectively under than under , with concrete bounds expressed through diagram distances and covariance eigenvalues, and they corroborate these results with computational experiments demonstrating improved clustering performance. The findings have practical implications for topology-aware data analysis pipelines, particularly when dimension reduction or scaling is involved, and suggest avenues for applying to broader nonlinear DR methods.

Abstract

Persistence diagrams (PDs) are used as signatures of point cloud data. Two clouds of points can be compared using the bottleneck distance d_B between their PDs. A potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B when there is a large scaling between them. This situation is typical in dimension reduction frameworks. We define, and study properties of, a new scale-invariant distance between PDs termed normalized bottleneck distance, d_N. In defining d_N, we develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X))) < e, where dgm(X) is the Vietoris-Rips PD of X, and pairwise distances are preserved by f up to the tolerance 0 < ε< 1. For mMDS, we present new bounds for d_B and d_N between PDs of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X). Finally, we use computational experiments to demonstrate the increased effectiveness of using the normalized bottleneck distance for clustering sets of point clouds sampled from different shapes.
Paper Structure (16 sections, 27 theorems, 31 equations, 6 figures, 1 table)

This paper contains 16 sections, 27 theorems, 31 equations, 6 figures, 1 table.

Key Result

Theorem \ref{thm:Hom:dN_stabil}

$\operatorname{d_N}(X,Y) \leq \frac{2\left\lVert\Delta\right\rVert}{\operatorname{diam}(Y)}\,$,

Figures (6)

  • Figure 1: Two pairs of point clouds, one without scaling (top row) and another with a large amount of scaling (bottom row), and their 1D persistence diagrams ($H_1$). The red and blue point clouds in the top row (first two figures) are both sampled from noisy circles in the $[-10,10] \times [-10,10]$ box. The third and fourth figures show the births of the hole in each case at diameters of around $7$. The first (red) point cloud in the bottom row is similar to the ones in the top row, but the second (blue) point cloud is sampled from a noisy circle in the $[-80,80] \times [-80,80]$ box. Its hole feature is born at a diameter of around $26$. The bottleneck distance between the pair of PDs is small in the top row, but can be quite large depending on the degree of scaling of the second (blue) point cloud in the bottom row.
  • Figure 2: UMAP reduction of a saddle boundary in $\mathbb{R}^3$ to $\mathbb{R}^2$.
  • Figure 3: Meshes of frogs and chairs from the Free 3D free3d data base. Note the varying scales across the models from each class. The frogs chosen are (from Left to Right): https://free3d.com/3d-model/frog-v1--149240.html, https://free3d.com/3d-model/frog-v1--30593.html, https://free3d.com/3d-model/banjofrog-v1--699349.html, and https://free3d.com/3d-model/frog-v1--64825.html. The chairs chosen are (from Left to Right): https://free3d.com/3d-model/-folding-chair-metal-v2--311510.html, https://free3d.com/3d-model/folding-chairs-v1--612720.html, https://free3d.com/3d-model/fold-out-chair-folded-v1--372704.html, and https://free3d.com/3d-model/monobloc-chair-v1--691935.html.
  • Figure 4: Point clouds sampled from the meshes of frogs and chairs shown in Figure \ref{['fig:frogschairs']} (top two rows) and point clouds sampled from surfaces of tori at varying scales (bottom row).
  • Figure 5: MDS projection into 2D of the 12 data sets using $\operatorname{d_B}$ distances between their $H_1$ PDs. The two tori points on the far right correspond to the first and third data sets in the third row of Figure \ref{['fig:frogschairstoriPCDs']}, which have bigger scales compared to the other tori data sets. $3$-means clustering puts these two tori into isolated clusters of their own with the remaining 10 data sets grouped into the third cluster (as shown by the 10 points on the left).
  • ...and 1 more figures

Theorems & Definitions (52)

  • Theorem \ref{thm:Hom:dN_stabil}
  • Theorem \ref{thm:Hom:JL_dB}
  • Corollary \ref{thm:Hom:JL_dB}
  • Corollary \ref{thm:Hom:JL_dB}
  • Corollary \ref{thm:Hom:JL_dB}
  • Corollary \ref{thm:Hom:JL_dB}
  • Theorem \ref{thm:Hom:JL_dB}: Chazal et al., 2014 ChdeSOu2014
  • Definition \ref{thm:Hom:JL_dB}
  • Definition \ref{thm:Hom:JL_dB}
  • Definition \ref{thm:Hom:JL_dB}
  • ...and 42 more