Table of Contents
Fetching ...

Discrete transforms of quantized persistence diagrams

Michael Etienne Van Huffel, Olympio Hacquard, Vadim Lebovici, Matteo Palo

TL;DR

This work introduces Qupid (QUantized Persistence and Integral transforms of Diagrams), a novel and simple method for vectorizing persistence diagrams that results in very low computational costs while preserving highly competitive performances compared to state-of-the-art methods across numerous classification tasks on both synthetic and real-world datasets.

Abstract

Topological data analysis leverages topological features to analyze datasets, with applications in diverse fields like medical sciences and biology. A key tool of this theory is the persistence diagram, which encodes topological information but poses challenges for integration into standard machine learning pipelines. We introduce Qupid (QUantized Persistence and Integral transforms of Diagrams), a novel and simple method for vectorizing persistence diagrams. First, Qupid uses a binning procedure to turn persistence diagrams into finite measures on a grid and then applies discrete transforms to these measures. Key features are the choice of log-scaled grids that emphasize information contained near the diagonal in persistence diagrams, combined with the use of discrete transforms to enhance and efficiently encode the obtained topological information. We conduct an in-depth experimental analysis of Qupid, showing that the simplicity of our method results in very low computational costs while preserving highly competitive performances compared to state-of-the-art methods across numerous classification tasks on both synthetic and real-world datasets. Finally, we provide experimental evidence that our method is robust to a decrease in the grid resolution used.

Discrete transforms of quantized persistence diagrams

TL;DR

This work introduces Qupid (QUantized Persistence and Integral transforms of Diagrams), a novel and simple method for vectorizing persistence diagrams that results in very low computational costs while preserving highly competitive performances compared to state-of-the-art methods across numerous classification tasks on both synthetic and real-world datasets.

Abstract

Topological data analysis leverages topological features to analyze datasets, with applications in diverse fields like medical sciences and biology. A key tool of this theory is the persistence diagram, which encodes topological information but poses challenges for integration into standard machine learning pipelines. We introduce Qupid (QUantized Persistence and Integral transforms of Diagrams), a novel and simple method for vectorizing persistence diagrams. First, Qupid uses a binning procedure to turn persistence diagrams into finite measures on a grid and then applies discrete transforms to these measures. Key features are the choice of log-scaled grids that emphasize information contained near the diagonal in persistence diagrams, combined with the use of discrete transforms to enhance and efficiently encode the obtained topological information. We conduct an in-depth experimental analysis of Qupid, showing that the simplicity of our method results in very low computational costs while preserving highly competitive performances compared to state-of-the-art methods across numerous classification tasks on both synthetic and real-world datasets. Finally, we provide experimental evidence that our method is robust to a decrease in the grid resolution used.
Paper Structure (13 sections, 9 equations, 6 figures, 7 tables)

This paper contains 13 sections, 9 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Union of balls at radii $t_1$ (b) and $t_2$ (c) with $t_1<t_2$ centred on a point cloud sampled around two circles (a). Taking every radius $t$ defines the Čech filtration and the corresponding persistence diagrams (d).
  • Figure 2: An example of degree-1 persistence diagram in birth-persistence coordinates obtained from the Čech filtration of a point cloud (a), its associated quantized persistence diagram with regular grid of size $50\times 50$ (b), its associated quantized diagram with log-scaled grid of size $50\times 50$ and parameter $\alpha = (500,500)$ (c), and the discrete Daubechies transform of order $2$ (with approximation coefficients, vertical, horizontal, diagonal details from top left to bottom right) of this quantized diagram (d).
  • Figure 3: Examples of point clouds from the ORBIT5K dataset.
  • Figure 4: t-SNE plot of Qupid for the Coiflet transform of order $2$ on the tumor immune cells dataset.
  • Figure 5: Importance of the coefficients of the Qupid-coif2 vectorization for the classification task CD68$^+$ vs FoxP3$^+$.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Definition 3.1
  • Definition 3.2
  • Definition 4.1