Table of Contents
Fetching ...

Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures

David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Bakke Botnan, Steve Oudot

TL;DR

The paper tackles stable, scalable vectorization of multiparameter persistent homology by treating signed barcodes as signed measures. It introduces two vectorizations—convolution-based and sliced Wasserstein kernel—along with provable Lipschitz stability, enabling reliable feature extraction from MPH descriptors. The Hilbert decomposition signed measure provides a practical, efficient representation, while Euler-based signed measures offer an alternative; both feed into stable, computable vectorizations. Empirical results across point clouds, graphs, and virtual screening demonstrate competitive performance against state-of-the-art topology-based methods and strong baselines, highlighting the method’s potential for robust topological ML pipelines. The work also discusses limitations and future directions, including differentiability of descriptors and integration with neural architectures for learned, data-adaptive discretizations.

Abstract

Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.

Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures

TL;DR

The paper tackles stable, scalable vectorization of multiparameter persistent homology by treating signed barcodes as signed measures. It introduces two vectorizations—convolution-based and sliced Wasserstein kernel—along with provable Lipschitz stability, enabling reliable feature extraction from MPH descriptors. The Hilbert decomposition signed measure provides a practical, efficient representation, while Euler-based signed measures offer an alternative; both feed into stable, computable vectorizations. Empirical results across point clouds, graphs, and virtual screening demonstrate competitive performance against state-of-the-art topology-based methods and strong baselines, highlighting the method’s potential for robust topological ML pipelines. The work also discusses limitations and future directions, including differentiability of descriptors and integration with neural architectures for learned, data-adaptive discretizations.

Abstract

Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
Paper Structure (37 sections, 14 theorems, 26 equations, 4 figures, 3 tables)

This paper contains 37 sections, 14 theorems, 26 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Let $\mu \in \mathcal{M}_0(\mathbb{R}^n)$ be a finite signed point measure with $\mu^+ = \sum_{i} \delta_{x_i}$ and $\mu^- = \sum_{i} \delta_{y_i}$, where $X = \{x_1, \dots, x_k\}$ and $Y = \{y_1, \dots, y_k\}$ are lists of points of $\mathbb{R}^n$. Then, Moreover, if $n=1$ and $X$ and $Y$ are such that $x_i \leq x_{i+1}$ and $y_i \leq y_{i+1}$ for all $1 \leq i \leq k-1$, then the above minimum

Figures (4)

  • Figure 1: An instance of the pipeline proposed in this article. Left to right: A filtered simplicial complex $(S,f)$ (in this case a bi-filtered graph); the Hilbert function of its $0$th dimensional homology persistence module $H_0(f) : \mathbb{R}^2 \xrightarrow{\;\;\;} \mathrm{vec}$ (which in this case simply counts the number of connected components); the Hilbert decomposition signed measure $\mu_{H_0(f)}$ of the persistence module; and the convolution of the signed measure with a Gaussian kernel.
  • Figure 2: Left to right: The filtered simplicial complex of \ref{['figure:pipeline']} of the main article; its $0$th dimensional homology persistence module $H_0(f)$; the Hilbert function of $H_0(f)$; and a decomposition of the Hilbert function of $H_0(f)$ as a linear combination of Hilbert functions of finitely generated projective persistence modules: $\dim(H_0(f)) = \dim(P_{(0,1)} \oplus P_{(1,0)} \oplus P_{(2,2)}) - \dim(P_{(2,1)} \oplus P_{(1,2)})$.
  • Figure 3: A Hilbert decomposition (in the sense of oudot-scoccola) of the module of \ref{['figure:hilbert-decomposition-1']}, and the corresponding Hilbert decomposition signed measure. In the Hilbert decomposition, the supports of the persistence modules in yellow are interpreted as positive bars, while the supports of the persistence modules in blue are interpreted as negative bars. Since bars corresponding to finitely generated projective modules are of the form $\{y \in \mathbb{R}^n : y \geq x\}$ for some $x \in \mathbb{R}^n$, they are uniquely characterized by their corresponding $x \in \mathbb{R}^n$; this is why bars in oudot-scoccola are just taken to be points of $\mathbb{R}^n$.
  • Figure 4: Top:$L^1$-distance between filtering functions and Kantorovich--Rubinstein distance between the associated Hilbert signed measures. Bottom: Kantorovich--Rubistein distance between signed measures and distance between the vectors produced by our vectorizations. Different colors indicate different runs of the random walk used to construct the filtering functions. Axis do not have scale since scale depends on the choice of norms and of vectorization parameters.

Theorems & Definitions (38)

  • Example 1
  • Definition 1
  • Example 2
  • Example 3
  • Definition 2
  • Definition 3
  • Remark 1
  • Proposition 1
  • Definition 4
  • Definition 5
  • ...and 28 more