EMP: Effective Multidimensional Persistence for Graph Representation Learning
Ignacio Segovia-Dominguez, Yuzhou Chen, Cuneyt G. Akcora, Zhiwei Zhen, Murat Kantarcioglu, Yulia R. Gel, Baris Coskunuzer
TL;DR
The paper tackles the limitation of single-parameter persistent homology in graph representation learning by introducing Effective Multidimensional Persistence (EMP), a slicing-based framework that simultaneously varies two scale parameters to produce multidimensional topological fingerprints. EMP constructs a grid of subgraphs via a first filter $f$, computes persistence diagrams with a second filter $g$ on each grid cell, and then applies a fixed-size single-parameter vectorization to yield a matrix $\mathbf{M}_\varphi$ that ML models can readily consume. The authors provide theoretical stability guarantees for EMP summaries and demonstrate superior or competitive performance on nine graph-classification benchmarks, often outperforming state-of-the-art methods while offering interpretable topological features. The work offers a practical bridge between multipersistence theory and machine learning, enabling scalable, stable, and informative graph representations rooted in topology. Overall, EMP advances topological graph representations by delivering flexible, stable, and ML-friendly multidimensional summaries with clear applicability to real-world datasets.
Abstract
Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data's multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP's utility in graph classification tasks, showing its effectiveness. Results reveal that EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets.
