Table of Contents
Fetching ...

Pull-back Geometry of Persistent Homology Encodings

Shuang Liang, Renata Turkeš, Jiayi Li, Nina Otter, Guido Montúfar

TL;DR

A novel methodology based on the pull-back geometry that a PH encoding induces on the data manifold is proposed, which shows that the pull-back norm correlates with the performance on downstream tasks, and can therefore guide the choice of a suitable PH encoding.

Abstract

Persistent homology (PH) is a method for generating topology-inspired representations of data. Empirical studies that investigate the properties of PH, such as its sensitivity to perturbations or ability to detect a feature of interest, commonly rely on training and testing an additional model on the basis of the PH representation. To gain more intrinsic insights about PH, independently of the choice of such a model, we propose a novel methodology based on the pull-back geometry that a PH encoding induces on the data manifold. The spectrum and eigenvectors of the induced metric help to identify the most and least significant information captured by PH. Furthermore, the pull-back norm of tangent vectors provides insights about the sensitivity of PH to a given perturbation, or its potential to detect a given feature of interest, and in turn its ability to solve a given classification or regression problem. Experimentally, the insights gained through our methodology align well with the existing knowledge about PH. Moreover, we show that the pull-back norm correlates with the performance on downstream tasks, and can therefore guide the choice of a suitable PH encoding.

Pull-back Geometry of Persistent Homology Encodings

TL;DR

A novel methodology based on the pull-back geometry that a PH encoding induces on the data manifold is proposed, which shows that the pull-back norm correlates with the performance on downstream tasks, and can therefore guide the choice of a suitable PH encoding.

Abstract

Persistent homology (PH) is a method for generating topology-inspired representations of data. Empirical studies that investigate the properties of PH, such as its sensitivity to perturbations or ability to detect a feature of interest, commonly rely on training and testing an additional model on the basis of the PH representation. To gain more intrinsic insights about PH, independently of the choice of such a model, we propose a novel methodology based on the pull-back geometry that a PH encoding induces on the data manifold. The spectrum and eigenvectors of the induced metric help to identify the most and least significant information captured by PH. Furthermore, the pull-back norm of tangent vectors provides insights about the sensitivity of PH to a given perturbation, or its potential to detect a given feature of interest, and in turn its ability to solve a given classification or regression problem. Experimentally, the insights gained through our methodology align well with the existing knowledge about PH. Moreover, we show that the pull-back norm correlates with the performance on downstream tasks, and can therefore guide the choice of a suitable PH encoding.
Paper Structure (68 sections, 5 theorems, 75 equations, 29 figures, 1 table)

This paper contains 68 sections, 5 theorems, 75 equations, 29 figures, 1 table.

Key Result

Proposition 4

Let $\mathcal{M}$ be the set containing all point clouds in $\mathbb{R}^D$ with $N$ distinct points, and $d_W$ be the $2$-Wasserstein distance on $\mathcal{M}$. For any point cloud $X\in\mathcal{M}$, there exists a Wasserstein ball $B_W(X,\varepsilon_X)$ and an injective mapping $\xi_X: B_W(X,\varep

Figures (29)

  • Figure 1: Schematic pipeline of our proposed method (comparing it with performance-based testing).
  • Figure 2: The pipeline for constructing a persistence image described in Section \ref{['sec:intro-PI']}. From left to right: (a) input point cloud; (b) Vietoris-Rips filtration built on the point cloud; (c) 1-dimensional persistence diagram; (d) birth-lifespan pairs (transformed 1-dimensional persistence diagram); and (e) persistence image.
  • Figure 3: The space of point clouds forms a manifold, which in this figure is depicted as a torus; each point on this manifold is a point cloud. Left: vector fields on the data manifold correspond to variations of the point clouds; in this illustration, the red arrows correspond to "rotation" and the blue arrows to "shearing". Right: a continuous feature on the data manifold induces a gradient vector field; the figure illustrates a binary feature, where the dashed line is the class boundary, and the continuous feature value represents the probability of the data point belonging to the "red" class.
  • Figure 4: A visualization of the Jacobian map and the pull-back norm. Here $f$ denotes an encoding map from the input space $\mathcal{M}$ to the output space $\mathcal{N}$. Left: the Jacobian of the encoding sends tangent vectors in the tangent space $T_X\mathcal{M}$ of $\mathcal{M}$ to tangent vectors in the tangent space $T_{f(X)}\mathcal{N}$ of $\mathcal{N}$. Right: the pull-back norm of a tangent vector on $\mathcal{M}$ measures by what amount the output of the encoding would change in response to the variation of the input by that tangent vector. In this schematic illustration, the pull-back norm of the red vector ("noising") is larger than the pull-back norm of the blue vector ("shearing").
  • Figure 5: Left: The normalized spectrum of the Jacobian for different encodings. Shown is the mean and standard error of the ordered normalized singular values over different input point clouds. Right: The top two eigenvectors of the Jacobian for the PH encoding constructed on the Rips filtration at a particular input point cloud.
  • ...and 24 more figures

Theorems & Definitions (16)

  • Definition 1: Perturbation vector field
  • Definition 2: Gradient vector field
  • Definition 3: Average pull-back norm
  • Proposition 4
  • proof
  • Corollary 5
  • proof
  • Definition 6: Riemannian distance
  • Lemma 7: do1992riemannian
  • Proposition 8
  • ...and 6 more