Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

Mathieu Carrière; Seunghyun Kim; Woojin Kim

Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

Mathieu Carrière, Seunghyun Kim, Woojin Kim

TL;DR

This paper tackles the computational bottleneck of generalized persistence diagrams (GPDs) in multi-parameter persistence by proposing a gradient-descent-based sparsification of the interval domain. It introduces the sparse erosion distance $\hat{d}_{\mathrm{E}}$ to compare GPDs across different interval sets, and derives a closed-form distance for practical interval collections, enabling efficient optimization. The loss $\mathcal{L}_{\hat{d}_{\mathrm{E}} ,m}$ is shown to be Lipschitz, convexly vectorizable, and differentiable almost everywhere, making gradient-based search feasible. Numerical experiments on time-series data demonstrate significant speedups in GPD computation with maintained classification performance, highlighting the method’s potential for scalable multi-parameter topological analysis. The work also provides open-source code and outlines directions to extend the approach to richer interval shapes and other GRI-based descriptors.

Abstract

The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge. In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.

Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

TL;DR

Abstract

Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (17)