Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods
Shuming Liang, Yu Ding, Zhidong Li, Bin Liang, Siqi Zhang, Yang Wang, Fang Chen
TL;DR
The paper investigates whether aggregation-based GNNs can learn pair-specific structural cues for link prediction, with a focus on the number of common neighbors (NCN). It combines analytical arguments about the limitations of set-based neighborhood pooling with extensive experiments showing that trainable node embeddings significantly boost performance, especially in dense graphs, while NCN-dependent heuristics and SEAL-type methods have distinct strengths in sparse graphs. The findings illuminate fundamental limits of GNNs for NCN-based information and offer practical guidance on method selection conditioned on graph density, pointing to directions for developing more robust link-prediction algorithms. Overall, incorporating node embeddings and selectively leveraging NCN-aware heuristics emerge as key factors for effective link prediction across graph regimes.
Abstract
This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.
