Table of Contents
Fetching ...

PHLP: Sole Persistent Homology for Link Prediction - Interpretable Feature Extraction

Junwon You, Eunwoo Heo, Jae-Hun Jung

TL;DR

PHLP introduces a purely topological approach to link prediction by comparing persistence information of angle-hop subgraphs with and without the target link. It uses Degree DRNL labeling and persistence images to produce interpretable feature vectors fed to a simple MLP, achieving competitive performance with state-of-the-art models and SOTA on the Power dataset. The framework extends to MA-PHLP, which aggregates information across multiple angles, and to hybrids that augment existing LP models; a stability theorem guarantees robustness of the persistence diagrams. Overall, PHLP demonstrates that topological features can reveal key factors driving LP performance while avoiding the opacity of deep neural networks.

Abstract

Link prediction (LP), inferring the connectivity between nodes, is a significant research area in graph data, where a link represents essential information on relationships between nodes. Although graph neural network (GNN)-based models have achieved high performance in LP, understanding why they perform well is challenging because most comprise complex neural networks. We employ persistent homology (PH), a topological data analysis method that helps analyze the topological information of graphs, to interpret the features used for prediction. We propose a novel method that employs PH for LP (PHLP) focusing on how the presence or absence of target links influences the overall topology. The PHLP utilizes the angle hop subgraph and new node labeling called degree double radius node labeling (Degree DRNL), distinguishing the information of graphs better than DRNL. Using only a classifier, PHLP performs similarly to state-of-the-art (SOTA) models on most benchmark datasets. Incorporating the outputs calculated using PHLP into the existing GNN-based SOTA models improves performance across all benchmark datasets. To the best of our knowledge, PHLP is the first method of applying PH to LP without GNNs. The proposed approach, employing PH while not relying on neural networks, enables the identification of crucial factors for improving performance.

PHLP: Sole Persistent Homology for Link Prediction - Interpretable Feature Extraction

TL;DR

PHLP introduces a purely topological approach to link prediction by comparing persistence information of angle-hop subgraphs with and without the target link. It uses Degree DRNL labeling and persistence images to produce interpretable feature vectors fed to a simple MLP, achieving competitive performance with state-of-the-art models and SOTA on the Power dataset. The framework extends to MA-PHLP, which aggregates information across multiple angles, and to hybrids that augment existing LP models; a stability theorem guarantees robustness of the persistence diagrams. Overall, PHLP demonstrates that topological features can reveal key factors driving LP performance while avoiding the opacity of deep neural networks.

Abstract

Link prediction (LP), inferring the connectivity between nodes, is a significant research area in graph data, where a link represents essential information on relationships between nodes. Although graph neural network (GNN)-based models have achieved high performance in LP, understanding why they perform well is challenging because most comprise complex neural networks. We employ persistent homology (PH), a topological data analysis method that helps analyze the topological information of graphs, to interpret the features used for prediction. We propose a novel method that employs PH for LP (PHLP) focusing on how the presence or absence of target links influences the overall topology. The PHLP utilizes the angle hop subgraph and new node labeling called degree double radius node labeling (Degree DRNL), distinguishing the information of graphs better than DRNL. Using only a classifier, PHLP performs similarly to state-of-the-art (SOTA) models on most benchmark datasets. Incorporating the outputs calculated using PHLP into the existing GNN-based SOTA models improves performance across all benchmark datasets. To the best of our knowledge, PHLP is the first method of applying PH to LP without GNNs. The proposed approach, employing PH while not relying on neural networks, enables the identification of crucial factors for improving performance.
Paper Structure (23 sections, 3 theorems, 13 equations, 8 figures, 10 tables, 2 algorithms)

This paper contains 23 sections, 3 theorems, 13 equations, 8 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

Let $G^{(u,v)} = (V,E)$ be a graph with target nodes $u,v$ in $V$ and let $f^{(u,v)}$ and $g^{(u,v)}$ be two node labeling fucntions defined on $G^{(u,v)}$. Denote the $p$-dimensional persistence diagrams of $G^{(u,v)}$ obtained from the filtrations constructed by $f^{(u,v)}$ and $g^{(u,v)}$ as $dgm where $D_B$ is bottleneck distance, $\lVert \cdot \rVert_\infty$ is infinity norm.

Figures (8)

  • Figure 1: Difference between the GNN-based and proposed methods. (Left) The GNN-based method extracts feature vectors through optimization (dashed area), making it difficult to interpret what these vectors represent. (Right) The proposed method extracts feature vectors through the designed analysis process, resulting in interpretable vectors.
  • Figure 2: Topological features in subgraphs with and without a target link $(u,v)$. The diagram illustrates the topological information extraction process for the subgraph $\mathcal{N}$, as described in Section \ref{['subsec:persistenthomology']}. The presence (top) or absence (bottom) of the target link changes the topological structure of the graph. Top row: When the target link is connected, three features ($C_1$, $C_2$, and $C_3$) are detected shown in the persistence image (PI) in the right column. The PI represents the topological features of the subgraph $\mathcal{N}$ (Section \ref{['subsec:persistenceimage']}). Bottom row: When the target link is absent, only two features ($C_2$ and $C_3$) are detected as depicted in the corresponding PI.
  • Figure 3: Overall structure of persistent homology for link prediction (PHLP) and multiangle PHLP (MA-PHLP). (a) PHLP calculates the topological information based on the existence of target links in angle hop subgraphs for each target node. (b) With a classifier, MA-PHLP integrates topological information across various angles to perform LP.
  • Figure 4: The motivation of $(k,l)$-angle hop subgraph. Just as viewing photographs of an apple from multiple angles provides a comprehensive understanding. This figure illustrates the capability to extract subgraphs from various perspectives.
  • Figure 5: Node labeling on graphs. (a) Node label values without considering the graph structure cannot distinguish between $G_1$ and $G_2$ using DRNL. (b) Applying Degree DRNL allows $G_1$ and $G_2$ to be distinguished solely by node label values.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1: Graph isomorphism
  • Definition 2: Graph isomorphism with target nodes
  • Theorem 1: Stability theorem
  • Corollary 1
  • proof
  • Corollary 2