Table of Contents
Fetching ...

Implicit degree bias in the link prediction task

Rachith Aiyappa, Xin Wang, Munjung Kim, Ozgur Can Seckin, Jisung Yoon, Yong-Yeol Ahn, Sadamori Kojaku

TL;DR

The paper reveals a strong degree bias in the standard link prediction benchmark caused by edge sampling, where positive edges overrepresent high-degree nodes (with $p_{ ext{pos}}(k)=\frac{1}{\langle k \rangle} k p(k)$) while negatives remain degree-balanced as $p_{ ext{neg}}(k)=p(k)$. It introduces a degree-corrected benchmark that matches the degree distributions of positives and negatives, showing that degree-only methods lose their advantage and that alignment with recommendation tasks improves, including better training outcomes for GNNs on community-detection benchmarks. The work demonstrates that current evaluations can overfit to node degrees and that the corrected benchmark provides more faithful assessments and learning signals, ultimately improving structure discovery in graphs. These findings have practical implications for fairer, more robust graph ML evaluations and training, with code and data made available to promote reproducibility and broader adoption.

Abstract

Link prediction -- a task of distinguishing actual hidden edges from random unconnected node pairs -- is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the validity of the link prediction benchmark itself has been rarely questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes and produces a highly skewed evaluation that favors methods overly dependent on node degree, to the extent that a ``null'' link prediction method based solely on node degree can yield nearly optimal performance. We propose a degree-corrected link prediction task that offers a more reasonable assessment that aligns better with the performance in the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.

Implicit degree bias in the link prediction task

TL;DR

The paper reveals a strong degree bias in the standard link prediction benchmark caused by edge sampling, where positive edges overrepresent high-degree nodes (with ) while negatives remain degree-balanced as . It introduces a degree-corrected benchmark that matches the degree distributions of positives and negatives, showing that degree-only methods lose their advantage and that alignment with recommendation tasks improves, including better training outcomes for GNNs on community-detection benchmarks. The work demonstrates that current evaluations can overfit to node degrees and that the corrected benchmark provides more faithful assessments and learning signals, ultimately improving structure discovery in graphs. These findings have practical implications for fairer, more robust graph ML evaluations and training, with code and data made available to promote reproducibility and broader adoption.

Abstract

Link prediction -- a task of distinguishing actual hidden edges from random unconnected node pairs -- is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the validity of the link prediction benchmark itself has been rarely questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes and produces a highly skewed evaluation that favors methods overly dependent on node degree, to the extent that a ``null'' link prediction method based solely on node degree can yield nearly optimal performance. We propose a degree-corrected link prediction task that offers a more reasonable assessment that aligns better with the performance in the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.
Paper Structure (5 sections, 9 equations, 3 figures, 1 algorithm)

This paper contains 5 sections, 9 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the degree bias in the link prediction benchmark. A: A node with degree $k$ appears $k$ times in the edge list, making it $k$ times more likely to be sampled as a positive edge than a node with degree 1. B: The degree distribution of the nodes in the positive and negative edges sampled from a Price graph of $N=10^5$ nodes and $M=10^6$ edges. The y-axis, "CCDF", denotes the complementary cumulative distribution function, representing the probability that a node's degree is at least $k$. Dashed lines illustrate the relationship described by Eq. \ref{['main:eq:pos-neg-degree-dist']}. C: The AUC-ROC score for the Preferential Attachment (PA) method on empirical graphs, with the dashed line indicating the expected AUC-ROC based on the "null" link prediction method that uses only the node degree information (Eq. \ref{['main:eq:auc-roc-log-normal']}). D: AUC-ROC of 26 methods across 90 graphs. PA outperforms 54% of the method on average across 90 graphs and outperforms most other methods in the most heterogeneous graphs. E: AUC-ROC of the same methods for the degree-corrected benchmark, showing that nearly 86% of the methods perform better than PA on average.
  • Figure 2: Comparative analysis of link prediction and recommendation benchmarks. A: Rank changes across the original and degree-corrected link prediction benchmarks. The red and blue lines indicate the methods that change their rankings more than 8 places from the original one. We compute the ranking of the methods based on the AUC-ROC of the link prediction benchmark and compare it against the ranking based on the link retrieval task. B: The degree-corrected benchmark ranks link prediction methods more similarly with the recommendation task than the original benchmark. The RBO (rank-biased overlap) represents the similarity between the ranking of the link prediction methods and that based on the recommendation task.
  • Figure 3: The degree-corrected benchmark improves GNNs in learning community structure in the LFR graphs. The LFR graphs consist of $3,000$ nodes with the average degree of $25$. A: The performance of community detection for the LFR graphs with a power-law degree distribution with $\tau_1 = 3.0$ as a function of $\mu$. B: The average performance (by the area under the performance curve) shows that the methods trained with the degree-corrected training set outperform their counterparts trained with the traditional link prediction training set. C, D: The same plots for LFR graphs with a fatter power-law degree distribution with $\tau_1 = 2.5$. The error bars represent the 95% confidence interval estimated by a bootstrap of 1,000 repetitions.