Table of Contents
Fetching ...

Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods

Shuming Liang, Yu Ding, Zhidong Li, Bin Liang, Siqi Zhang, Yang Wang, Fang Chen

TL;DR

The paper investigates whether aggregation-based GNNs can learn pair-specific structural cues for link prediction, with a focus on the number of common neighbors (NCN). It combines analytical arguments about the limitations of set-based neighborhood pooling with extensive experiments showing that trainable node embeddings significantly boost performance, especially in dense graphs, while NCN-dependent heuristics and SEAL-type methods have distinct strengths in sparse graphs. The findings illuminate fundamental limits of GNNs for NCN-based information and offer practical guidance on method selection conditioned on graph density, pointing to directions for developing more robust link-prediction algorithms. Overall, incorporating node embeddings and selectively leveraging NCN-aware heuristics emerge as key factors for effective link prediction across graph regimes.

Abstract

This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.

Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods

TL;DR

The paper investigates whether aggregation-based GNNs can learn pair-specific structural cues for link prediction, with a focus on the number of common neighbors (NCN). It combines analytical arguments about the limitations of set-based neighborhood pooling with extensive experiments showing that trainable node embeddings significantly boost performance, especially in dense graphs, while NCN-dependent heuristics and SEAL-type methods have distinct strengths in sparse graphs. The findings illuminate fundamental limits of GNNs for NCN-based information and offer practical guidance on method selection conditioned on graph density, pointing to directions for developing more robust link-prediction algorithms. Overall, incorporating node embeddings and selectively leveraging NCN-aware heuristics emerge as key factors for effective link prediction across graph regimes.

Abstract

This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.

Paper Structure

This paper contains 27 sections, 2 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: An illustration of neighborhood information propagation and aggregation in GNNs, where $a_{i,j}$ can be an edge weight or attention weight from node $j$ to $i$.
  • Figure 2: The results of Algorithm \ref{['alg:GNNLP']} on four OGB link prediction datasets, using heuristic encoding (HE) only, node features (X) only, node embeddings (NE) only, or their combinations. The data splits and evaluation metrics follow OGB official evaluation protocol hu2020open.
  • Figure 3: The algorithmic flow of SEAL-type link prediction methods.
  • Figure 4: Node labeling in SEAL-type methods. The left is a subgraph specific for a positive link sample and the right is a negative one. The labeling features are based on the SPDs from every node (here only show the first-order neighbors of node $v$ or $w$) to the target pair of nodes. For example, on the left, the node with the labeling $(1,10)$ indicates that the SPD from this node to node $v$ and $u$ is $1$ and $10$, respectively.
  • Figure 5: Results of different methods for link prediction on four OGB datasets. For MLP and general GNNs, we present their results obtained by utilizing node embeddings, considering the dominant performance of node embeddings as shown in Fig. \ref{['fig:hexne']}.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 6