Comparing discriminating abilities of evaluation metrics in link prediction
Xinshan Jiao, Shuyan Wan, Qian Liu, Yilin Bi, Yan-Li Lee, En Xu, Dong Hao, Tao Zhou
TL;DR
This work addresses how to assess the discriminating ability of link-prediction evaluation metrics. It introduces a simple artificial network with a tunable noise parameter $\eta$ to continuously adjust a designed predictor's accuracy and uses this setup to compare nine metrics, including threshold-dependent and threshold-free types. The key finding is that $\mathrm{AUC}$, $\mathrm{AUPR}$, and $\mathrm{NDCG}$ show strongest discrimination across varying noise levels, while $\mathrm{AUC{-}mROC}$ fluctuates the most, and BP remains relatively limited; the results suggest recommending $\mathrm{NDCG}$ as a robust alternative for imbalanced link-prediction tasks. The study provides a framework for evaluating metric discrimination that can guide metric selection and future metric design, with implications for understanding evaluation biases in sparse networks.
Abstract
Link prediction aims to predict the potential existence of links between two unconnected nodes within a network based on the known topological characteristics. Evaluation metrics are used to assess the effectiveness of algorithms in link prediction. The discriminating ability of these evaluation metrics is vitally important for accurately evaluating link prediction algorithms. In this study, we propose an artificial network model, based on which one can adjust a single parameter to monotonically and continuously turn the prediction accuracy of the specifically designed link prediction algorithm. Building upon this foundation, we show a framework to depict the effectiveness of evaluating metrics by focusing on their discriminating ability. Specifically, a quantitative comparison in the abilities of correctly discerning varying prediction accuracies was conducted encompassing nine evaluation metrics: Precision, Recall, F1-Measure, Matthews Correlation Coefficient (MCC), Balanced Precision (BP), the Area Under the receiver operating characteristic Curve (AUC), the Area Under the Precision-Recall curve (AUPR), Normalized Discounted Cumulative Gain (NDCG), and the Area Under the magnified ROC (AUC-mROC). The results indicate that the discriminating abilities of the three metrics, AUC, AUPR, and NDCG, are significantly higher than those of other metrics.
