Table of Contents
Fetching ...

A Survey of Vectorization Methods in Topological Data Analysis

Dashti Ali, Aras Asaad, Maria-Jose Jimenez, Vidit Nanda, Eduardo Paluzo-Hidalgo, Manuel Soriano-Trigueros

TL;DR

Surprisingly, it is found that the best-performing method is a simple vectorization, which consists only of a few elementary summary statistics.

Abstract

Attempts to incorporate topological information in supervised learning tasks have resulted in the creation of several techniques for vectorizing persistent homology barcodes. In this paper, we study thirteen such methods. Besides describing an organizational framework for these methods, we comprehensively benchmark them against three well-known classification tasks. Surprisingly, we discover that the best-performing method is a simple vectorization, which consists only of a few elementary summary statistics. Finally, we provide a convenient web application which has been designed to facilitate exploration and experimentation with various vectorization methods.

A Survey of Vectorization Methods in Topological Data Analysis

TL;DR

Surprisingly, it is found that the best-performing method is a simple vectorization, which consists only of a few elementary summary statistics.

Abstract

Attempts to incorporate topological information in supervised learning tasks have resulted in the creation of several techniques for vectorizing persistent homology barcodes. In this paper, we study thirteen such methods. Besides describing an organizational framework for these methods, we comprehensively benchmark them against three well-known classification tasks. Surprisingly, we discover that the best-performing method is a simple vectorization, which consists only of a few elementary summary statistics. Finally, we provide a convenient web application which has been designed to facilitate exploration and experimentation with various vectorization methods.
Paper Structure (35 sections, 2 theorems, 26 equations, 9 figures, 5 tables)

This paper contains 35 sections, 2 theorems, 26 equations, 9 figures, 5 tables.

Key Result

Theorem 1.1

For every persistence module $(V,a)$, there exists a unique set $\text{\bf Bar}(V,a)$ of subintervals of $[0,n]$ along with a unique function $\text{\bf Bar}(V,a) \to \mathbb{Z}_{>0}$ denoted $[p,q] \mapsto \mu_{p,q}$ for which we have an isomorphism

Figures (9)

  • Figure 1: An increasing family of cell complexes built around a point cloud dataset; the associated barcode in dimensions 0 (blue) and 1 (red) catalogues the connected components and cycles respectively.
  • Figure 2: Samples from datasets used in our experiments
  • Figure 3: A screenshot of the web app
  • Figure 4: Intervals in barcodes of dimensions $0$ and $1$ as displayed by the web app.
  • Figure 5: The Persistence Statistics vectorization as shown in the web app.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Theorem 1.1
  • Theorem 1.2
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Definition 2.7
  • Definition 2.8
  • ...and 5 more