Vectorization of Persistence Diagrams for Topological Data Analysis in R and Python Using TDAvec Package
Aleksei Luchinsky, Umar Islambekov
TL;DR
PDs capture topological features across scales but do not inhabit a Hilbert space, complicating ML integration. The paper presents TDAvec, a cross-language library (R and Python) that vectorizes persistence diagrams using an integration-based scheme, enabling stable, ML-ready representations. It implements eight core vectorizations (plus additional ones) with a fast C++ backend and a consistent API across R and Python. The work includes usage examples, performance claims (notably substantial speedups over existing tooling), and an appendix with formal vectorization definitions, facilitating systematic comparison of vectorization strategies. Together, TDAvec lowers barriers to incorporating topological descriptors into ML pipelines and supports reproducible, cross-language experimentation in TDA.
Abstract
Persistent homology is a widely-used tool in topological data analysis (TDA) for understanding the underlying shape of complex data. By constructing a filtration of simplicial complexes from data points, it captures topological features such as connected components, loops, and voids across multiple scales. These features are encoded in persistence diagrams (PDs), which provide a concise summary of the data's topological structure. However, the non-Hilbert nature of the space of PDs poses challenges for their direct use in machine learning applications. To address this, kernel methods and vectorization techniques have been developed to transform PDs into machine-learning-compatible formats. In this paper, we introduce a new software package designed to streamline the vectorization of PDs, offering an intuitive workflow and advanced functionalities. We demonstrate the necessity of the package through practical examples and provide a detailed discussion on its contributions to applied TDA. Definitions of all vectorization summaries used in the package are included in the appendix.
