Comparing discriminating abilities of evaluation metrics in link prediction

Xinshan Jiao; Shuyan Wan; Qian Liu; Yilin Bi; Yan-Li Lee; En Xu; Dong Hao; Tao Zhou

Comparing discriminating abilities of evaluation metrics in link prediction

Xinshan Jiao, Shuyan Wan, Qian Liu, Yilin Bi, Yan-Li Lee, En Xu, Dong Hao, Tao Zhou

TL;DR

This work addresses how to assess the discriminating ability of link-prediction evaluation metrics. It introduces a simple artificial network with a tunable noise parameter $\eta$ to continuously adjust a designed predictor's accuracy and uses this setup to compare nine metrics, including threshold-dependent and threshold-free types. The key finding is that $\mathrm{AUC}$, $\mathrm{AUPR}$, and $\mathrm{NDCG}$ show strongest discrimination across varying noise levels, while $\mathrm{AUC{-}mROC}$ fluctuates the most, and BP remains relatively limited; the results suggest recommending $\mathrm{NDCG}$ as a robust alternative for imbalanced link-prediction tasks. The study provides a framework for evaluating metric discrimination that can guide metric selection and future metric design, with implications for understanding evaluation biases in sparse networks.

Abstract

Link prediction aims to predict the potential existence of links between two unconnected nodes within a network based on the known topological characteristics. Evaluation metrics are used to assess the effectiveness of algorithms in link prediction. The discriminating ability of these evaluation metrics is vitally important for accurately evaluating link prediction algorithms. In this study, we propose an artificial network model, based on which one can adjust a single parameter to monotonically and continuously turn the prediction accuracy of the specifically designed link prediction algorithm. Building upon this foundation, we show a framework to depict the effectiveness of evaluating metrics by focusing on their discriminating ability. Specifically, a quantitative comparison in the abilities of correctly discerning varying prediction accuracies was conducted encompassing nine evaluation metrics: Precision, Recall, F1-Measure, Matthews Correlation Coefficient (MCC), Balanced Precision (BP), the Area Under the receiver operating characteristic Curve (AUC), the Area Under the Precision-Recall curve (AUPR), Normalized Discounted Cumulative Gain (NDCG), and the Area Under the magnified ROC (AUC-mROC). The results indicate that the discriminating abilities of the three metrics, AUC, AUPR, and NDCG, are significantly higher than those of other metrics.

Comparing discriminating abilities of evaluation metrics in link prediction

TL;DR

This work addresses how to assess the discriminating ability of link-prediction evaluation metrics. It introduces a simple artificial network with a tunable noise parameter

to continuously adjust a designed predictor's accuracy and uses this setup to compare nine metrics, including threshold-dependent and threshold-free types. The key finding is that

, and

show strongest discrimination across varying noise levels, while

fluctuates the most, and BP remains relatively limited; the results suggest recommending

as a robust alternative for imbalanced link-prediction tasks. The study provides a framework for evaluating metric discrimination that can guide metric selection and future metric design, with implications for understanding evaluation biases in sparse networks.

Abstract

Paper Structure (8 sections, 10 equations, 4 figures)

This paper contains 8 sections, 10 equations, 4 figures.

Introduction
Problem Description
Evaluation metrics
Threshold-Dependent Metrics
Threshold-Free Metrics
Discriminating Ability
Results
Discussion

Figures (4)

Figure 1: Illustration about (a) AUC, (b) AUPR and (c) AUC-mROC
Figure 2: How the value of a metric varies with changing noise. (a)-(d), (e)-(h) and (i)-(l) respectively represent the results when the thresholds for Precision, Recall, F1-Measure, and MCC are set to $|E^P|/2$, $|E^P|$, and $2|E^P|$. (m)-(p) depict the results for AUC, AUPR, NDCG, and AUC-mROC. The gray points represent the simulation of given values in single runs, the red points represent the average values of given noise intensities, and the error bars indicate the corresponding standard deviations.
Figure 3: The values of the evaluation metrics under different noise intensities, where $\eta=0.1$, $\eta=0.3$, $\eta=0.5$, $\eta=0.7$ and $\eta=0.9$. The x-axis represents the number of runs, and the y-axis represents the values of the evaluation metrics in the 1000 runs. (a)-(d), (e)-(h) and (i)-(l) represent the results when the thresholds for Precision, Recall, F1-Measure, and MCC are $|E^P|/2$, $|E^P|$, and $2|E^P|$, respectively. (m)-(p) denote the results for AUC, AUPR, NDCG, and AUC-mROC.
Figure 4: The binarized discrimination matrices of different evaluation metrics. The x-axis and y-axis represent the intensity of noise. (a)-(d), (e)-(h) and (i)-(l) respectively depict the results when the thresholds for Precision, Recall, F1-Measure, and MCC are $|E^P|/2$, $|E^P|$, and $2|E^P|$. (m)-(p) illustrate the outcomes for AUC, AUPR, NDCG, and AUC-mROC.

Comparing discriminating abilities of evaluation metrics in link prediction

TL;DR

Abstract

Comparing discriminating abilities of evaluation metrics in link prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)