Table of Contents
Fetching ...

Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

Mathieu Carrière, Seunghyun Kim, Woojin Kim

TL;DR

This paper tackles the computational bottleneck of generalized persistence diagrams (GPDs) in multi-parameter persistence by proposing a gradient-descent-based sparsification of the interval domain. It introduces the sparse erosion distance $\hat{d}_{\mathrm{E}}$ to compare GPDs across different interval sets, and derives a closed-form distance for practical interval collections, enabling efficient optimization. The loss $\mathcal{L}_{\hat{d}_{\mathrm{E}} ,m}$ is shown to be Lipschitz, convexly vectorizable, and differentiable almost everywhere, making gradient-based search feasible. Numerical experiments on time-series data demonstrate significant speedups in GPD computation with maintained classification performance, highlighting the method’s potential for scalable multi-parameter topological analysis. The work also provides open-source code and outlines directions to extend the approach to richer interval shapes and other GRI-based descriptors.

Abstract

The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge. In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.

Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

TL;DR

This paper tackles the computational bottleneck of generalized persistence diagrams (GPDs) in multi-parameter persistence by proposing a gradient-descent-based sparsification of the interval domain. It introduces the sparse erosion distance to compare GPDs across different interval sets, and derives a closed-form distance for practical interval collections, enabling efficient optimization. The loss is shown to be Lipschitz, convexly vectorizable, and differentiable almost everywhere, making gradient-based search feasible. Numerical experiments on time-series data demonstrate significant speedups in GPD computation with maintained classification performance, highlighting the method’s potential for scalable multi-parameter topological analysis. The work also provides open-source code and outlines directions to extend the approach to richer interval shapes and other GRI-based descriptors.

Abstract

The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge. In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.

Paper Structure

This paper contains 17 sections, 9 theorems, 17 equations, 3 figures, 3 tables.

Key Result

Proposition 5

$\hat{d_{\mathrm{E}}}$ is an extended pseudometric. (See A:pseudo for the proof.)

Figures (3)

  • Figure 3:
  • Figure 4: The parametrization of $I_r$ and $J_s$
  • Figure 5: Loss decrease across gradient descent iterations. One can see that the loss value stays on a plateau for the first $\sim$300 iterations; this is due to the fact that during these first iterations, the parameters in $\mathcal{J}$ that are updated with gradient descent are not yet the ones achieving the maxima and minima in the closed-form formula provided in \ref{['thm_distance_btw_collections_intvs']}\ref{['item:between_2,1_intervals']}.

Theorems & Definitions (17)

  • Definition 1: clause2022discriminating
  • Remark 2: clause2022discriminating and botnan2021signed
  • Definition 3: clause2022discriminating
  • Definition 4: Sparse erosion distance between GPDs relative to sampled intervals
  • Proposition 5
  • Proposition 6
  • Corollary 7
  • Lemma 8
  • Theorem 9
  • Remark 10
  • ...and 7 more