Table of Contents
Fetching ...

xPerT: Extended Persistence Transformer

Sehun Kim

TL;DR

A novel transformer architecture called xPerT is proposed, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams, and does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice.

Abstract

A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales. However, due to its nature as a set, incorporating it as a feature into a machine learning framework is challenging. Several methods have been proposed to use persistence diagrams as input for machine learning models, but they often require complex preprocessing steps and extensive hyperparameter tuning. In this paper, we propose a novel transformer architecture called the \textit{Extended Persistence Transformer (xPerT)}, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams. xPerT reduces GPU memory usage by over 90\% and improves accuracy on multiple datasets. Additionally, xPerT does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice. Our code is available at https://github.com/sehunfromdaegu/xpert.

xPerT: Extended Persistence Transformer

TL;DR

A novel transformer architecture called xPerT is proposed, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams, and does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice.

Abstract

A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales. However, due to its nature as a set, incorporating it as a feature into a machine learning framework is challenging. Several methods have been proposed to use persistence diagrams as input for machine learning models, but they often require complex preprocessing steps and extensive hyperparameter tuning. In this paper, we propose a novel transformer architecture called the \textit{Extended Persistence Transformer (xPerT)}, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams. xPerT reduces GPU memory usage by over 90\% and improves accuracy on multiple datasets. Additionally, xPerT does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice. Our code is available at https://github.com/sehunfromdaegu/xpert.

Paper Structure

This paper contains 37 sections, 3 theorems, 33 equations, 6 figures, 8 tables.

Key Result

Proposition 1

Let $D$ and $D'$ be two persistence diagrams, and let $\delta < W(D, D')$. Then we have:

Figures (6)

  • Figure 1: Scaling. Comparison of computational cost between persistence diagram transformer models in terms of training time and GPU memory usage (GB). The experiment was conducted using a batch size of 64 for the PROTEINS and IMDB-B datasets, and a batch size of 16 for the ORBIT5K dataset, as Persformer could not fit on our GPU (RTX 3090) with a batch size of 32.
  • Figure 2: Sublevel Set Filtration. Six sublevel sets of the height function are shown. As $c$ increases, the topology of $X_{c}$ changes, which is represented by the ordinary persistence diagram. However, ordinary persistence cannot detect the appearance of the blue upright arm, which can instead be reflected by the superlevel set filtration $(X^{c})_{c \in \mathbb{R}}$ as $c$ decreases.
  • Figure 3: Persistence Diagram. The extended persistence diagram includes information about the blue upright arm and the maximum value of $f$, which are not present in the ordinary persistence diagram. (Left) A topological space equipped with a height function $f$. (Center) Extended persistence diagram. (Right) Ordinary persistence diagram.
  • Figure 4: Projection of Persistence Diagram. Each point in $D_r$ is projected onto the center of its corresponding grid cell using the projection map $\Pi_\delta$.
  • Figure 5: xPerT Overview. The persistence diagram is pixelized and split into fixed-size patches (left, top). These patches are linearly transformed into token vectors, with a [cls] token added (left, bottom). Empty patches are excluded from the transformer input.(Right) The token vectors are processed by the transformer model, and the output of the [cls] token is fed to the linear head for classification.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Definition 1: Wasserstein Distance
  • Definition 2: Graph Laplacian
  • Definition 3: Heat Kernel Signature
  • Definition 4: Projection of Persistence Diagram
  • Proposition 1
  • proof
  • Remark 1
  • Definition 5: Pixelized Persistence Diagram
  • Lemma 1
  • proof
  • ...and 3 more