Table of Contents
Fetching ...

A Class of Topological Pseudodistances for Fast Comparison of Persistence Diagrams

Rolando Kindelan Nuñez, Mircea Petrache, Mauricio Cerda, Nancy Hitschfeld

TL;DR

A class of pseudodistances called Extended Topological Pseudodistances (ETD)s is introduced, which have tunable complexity, and can approximate Sliced and classical Wasserstein distances at the high-complexity extreme, while being computationally lighter and close to Persistence Statistics at the lower complexity extreme, and thus allow users to interpolate between the two metrics.

Abstract

Persistence diagrams (PD)s play a central role in topological data analysis, and are used in an ever increasing variety of applications. The comparison of PD data requires computing comparison metrics among large sets of PDs, with metrics which are accurate, theoretically sound, and fast to compute. Especially for denser multi-dimensional PDs, such comparison metrics are lacking. While on the one hand, Wasserstein-type distances have high accuracy and theoretical guarantees, they incur high computational cost. On the other hand, distances between vectorizations such as Persistence Statistics (PS)s have lower computational cost, but lack the accuracy guarantees and in general they are not guaranteed to distinguish PDs (i.e. the two PS vectors of different PDs may be equal). In this work we introduce a class of pseudodistances called Extended Topological Pseudodistances (ETD)s, which have tunable complexity, and can approximate Sliced and classical Wasserstein distances at the high-complexity extreme, while being computationally lighter and close to Persistence Statistics at the lower complexity extreme, and thus allow users to interpolate between the two metrics. We build theoretical comparisons to show how to fit our new distances at an intermediate level between persistence vectorizations and Wasserstein distances. We also experimentally verify that ETDs outperform PSs in terms of accuracy and outperform Wasserstein and Sliced Wasserstein distances in terms of computational complexity.

A Class of Topological Pseudodistances for Fast Comparison of Persistence Diagrams

TL;DR

A class of pseudodistances called Extended Topological Pseudodistances (ETD)s is introduced, which have tunable complexity, and can approximate Sliced and classical Wasserstein distances at the high-complexity extreme, while being computationally lighter and close to Persistence Statistics at the lower complexity extreme, and thus allow users to interpolate between the two metrics.

Abstract

Persistence diagrams (PD)s play a central role in topological data analysis, and are used in an ever increasing variety of applications. The comparison of PD data requires computing comparison metrics among large sets of PDs, with metrics which are accurate, theoretically sound, and fast to compute. Especially for denser multi-dimensional PDs, such comparison metrics are lacking. While on the one hand, Wasserstein-type distances have high accuracy and theoretical guarantees, they incur high computational cost. On the other hand, distances between vectorizations such as Persistence Statistics (PS)s have lower computational cost, but lack the accuracy guarantees and in general they are not guaranteed to distinguish PDs (i.e. the two PS vectors of different PDs may be equal). In this work we introduce a class of pseudodistances called Extended Topological Pseudodistances (ETD)s, which have tunable complexity, and can approximate Sliced and classical Wasserstein distances at the high-complexity extreme, while being computationally lighter and close to Persistence Statistics at the lower complexity extreme, and thus allow users to interpolate between the two metrics. We build theoretical comparisons to show how to fit our new distances at an intermediate level between persistence vectorizations and Wasserstein distances. We also experimentally verify that ETDs outperform PSs in terms of accuracy and outperform Wasserstein and Sliced Wasserstein distances in terms of computational complexity.
Paper Structure (28 sections, 3 theorems, 23 equations, 6 figures, 6 tables, 4 algorithms)

This paper contains 28 sections, 3 theorems, 23 equations, 6 figures, 6 tables, 4 algorithms.

Key Result

Proposition 1

For two multisets $\mathsf P_1,\mathsf P_2\subseteq \mathbb R$ the distances $W_p(\mathsf P_1, \mathsf P_2)$ can be computed in $O(N\log N)$ time.

Figures (6)

  • Figure 1: Example of data from Experiment 2: for each autoencoder layer, we plot the corresponding PD for $H_0, H_1,H_2$, in order from the input layer (left, first line) to the output/reconstruction layer (right, second line), for a total of $7$ layers. We plot the distance of each persistence diagram to the first one with respect to different metrics in Fig. \ref{['fig:curves']}.
  • Figure 2: Example data from Experiment 2: we plot, for each homology dimension $0,1,2$, the values of $\log(\mathsf{dist}(P_i,P_0)/\mathsf{WD}(P_i,P_0)), 0\le i\le 6$ where $P_i$ is the PD of the $i$-th layer, and $\mathsf{dist}$ is amongst our allowed metrics. For completeness, we also include the Fisher Kernel distance comparison, which is much less discriminative than other metrics.
  • Figure 3: Datasets
  • Figure 4: The Relu latent space information and persistence diagrams up to $H_2$.
  • Figure 5: The LRelu latent space information and persistence diagrams up to $H_2$.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 1: Wasserstein distances
  • Proposition 1
  • proof : Proof sketch:
  • Remark 1
  • Remark 2
  • Definition 2: Extended Topology Pseudodistances
  • Lemma 1
  • Remark 3: invariance properties of $\mathsf{ETD}_A$
  • Theorem 1: Computational cost of $\mathsf{ETD}_A$
  • Definition 3: Sliced Wasserstein Distance