Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Bakke Botnan, Steve Oudot
TL;DR
The paper tackles stable, scalable vectorization of multiparameter persistent homology by treating signed barcodes as signed measures. It introduces two vectorizations—convolution-based and sliced Wasserstein kernel—along with provable Lipschitz stability, enabling reliable feature extraction from MPH descriptors. The Hilbert decomposition signed measure provides a practical, efficient representation, while Euler-based signed measures offer an alternative; both feed into stable, computable vectorizations. Empirical results across point clouds, graphs, and virtual screening demonstrate competitive performance against state-of-the-art topology-based methods and strong baselines, highlighting the method’s potential for robust topological ML pipelines. The work also discusses limitations and future directions, including differentiability of descriptors and integration with neural architectures for learned, data-adaptive discretizations.
Abstract
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
