Attribute-Enhanced Similarity Ranking for Sparse Link Prediction
João Mattos, Zexi Huang, Mert Kosan, Ambuj Singh, Arlei Silva
TL;DR
Gelato tackles link prediction in sparse graphs by reframing it as a similarity-ranking problem rather than binary classification. It integrates node attributes into topology through a lightweight graph-learning step, applies Autocovariance as a global topological heuristic, and trains with an N-pair ranking loss using partitioned negative sampling to expose hard negatives. Across four datasets, Gelato consistently outperforms state-of-the-art GNN-based methods under unbiased testing and scales efficiently thanks to sparse computations. The work also scrutinizes evaluation practices, arguing that unbiased, rank-based metrics are essential for meaningful assessment in sparse graphs.
Abstract
Link prediction is a fundamental problem in graph data. In its most realistic setting, the problem consists of predicting missing or future links between random pairs of nodes from the set of disconnected pairs. Graph Neural Networks (GNNs) have become the predominant framework for link prediction. GNN-based methods treat link prediction as a binary classification problem and handle the extreme class imbalance -- real graphs are very sparse -- by sampling (uniformly at random) a balanced number of disconnected pairs not only for training but also for evaluation. However, we show that the reported performance of GNNs for link prediction in the balanced setting does not translate to the more realistic imbalanced setting and that simpler topology-based approaches are often better at handling sparsity. These findings motivate Gelato, a similarity-based link-prediction method that applies (1) graph learning based on node attributes to enhance a topological heuristic, (2) a ranking loss for addressing class imbalance, and (3) a negative sampling scheme that efficiently selects hard training pairs via graph partitioning. Experiments show that Gelato outperforms existing GNN-based alternatives.
