Table of Contents
Fetching ...

Quantifying discriminability of evaluation metrics in link prediction for real networks

Shuyan Wan, Yilin Bi, Xinshan Jiao, Tao Zhou

TL;DR

A novel measure is proposed that quantifiers the discriminability of any evaluation metric given a real network and an algorithm, and it is demonstrated that H-measure and Area Under the ROC Curve (AUC) exhibit the strongest discriminabilities, followed by Normalized Discounted Cumulative Gain (NDCG).

Abstract

Link prediction is one of the most productive branches in network science, aiming to predict links that would have existed but have not yet been observed, or links that will appear during the evolution of the network. Over nearly two decades, the field of link prediction has amassed a substantial body of research, encompassing a plethora of algorithms and diverse applications. For any algorithm, one or more evaluation metrics are required to assess its performance. Because using different evaluation metrics can provide different assessments of the algorithm performance, how to select appropriate evaluation metrics is a fundamental issue in link prediction. To address this issue, we propose a novel measure that quantifiers the discriminability of any evaluation metric given a real network and an algorithm. Based on 131 real networks and 20 representative algorithms, we systematically compare the discriminabilities of eight evaluation metrics, and demonstrate that H-measure and Area Under the ROC Curve (AUC) exhibit the strongest discriminabilities, followed by Normalized Discounted Cumulative Gain (NDCG). Our finding is robust for networks in different domains and algorithms of different types. This study provides insights into the selection of evaluation metrics, which may further contribute to standardizing the evaluating process of link prediction algorithms.

Quantifying discriminability of evaluation metrics in link prediction for real networks

TL;DR

A novel measure is proposed that quantifiers the discriminability of any evaluation metric given a real network and an algorithm, and it is demonstrated that H-measure and Area Under the ROC Curve (AUC) exhibit the strongest discriminabilities, followed by Normalized Discounted Cumulative Gain (NDCG).

Abstract

Link prediction is one of the most productive branches in network science, aiming to predict links that would have existed but have not yet been observed, or links that will appear during the evolution of the network. Over nearly two decades, the field of link prediction has amassed a substantial body of research, encompassing a plethora of algorithms and diverse applications. For any algorithm, one or more evaluation metrics are required to assess its performance. Because using different evaluation metrics can provide different assessments of the algorithm performance, how to select appropriate evaluation metrics is a fundamental issue in link prediction. To address this issue, we propose a novel measure that quantifiers the discriminability of any evaluation metric given a real network and an algorithm. Based on 131 real networks and 20 representative algorithms, we systematically compare the discriminabilities of eight evaluation metrics, and demonstrate that H-measure and Area Under the ROC Curve (AUC) exhibit the strongest discriminabilities, followed by Normalized Discounted Cumulative Gain (NDCG). Our finding is robust for networks in different domains and algorithms of different types. This study provides insights into the selection of evaluation metrics, which may further contribute to standardizing the evaluating process of link prediction algorithms.
Paper Structure (10 sections, 15 equations, 4 figures, 1 table)

This paper contains 10 sections, 15 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The survey of the evaluation metrics used in link prediction studies. We selected a total of 50 papers on link prediction, including well-known works hasan2006libennowell2007clauset2008liu2009zhou2009guimerà2009liu2010menon2011dunlavy2011lichtenwalter2012cannistraci2013kuo2013liu2015sherkat2015yang2015wang2015trouillon2016pan2016pech2017muscoloni2018zhang2018ding2018yasami2018kastrin2018kovács2019chen2019pech2019kumar2019haghani2019 and some selected studies in the past five years ahmad2020kitsak2020ghasemian2019liu2020nasiri2021anasiri2021bkou2021chen2021daza2021berahmand2021fan2022cai2022guo2022peng2022balogh2022nasiri2023yao2023mueller2023hong2023muscoloni2023menand2024, and compiled the usage of evaluation metrics in these studies.
  • Figure 2: The procedure of this study. (a) shows the extraction of links from the training set $E^{T}$ under different retention rates $q_{1}$ and $q_{2}$ for the same network, where $q_{1} < q_{2}$. (b) illustrates the calculation of evaluation metric (using AUC as an example) under different retention rates $q_{1}$ and $q_{2}$. (c) depicts the process to obtain the $p$-value matrix $[p_{ij}]_{n \times n}$ by comparing scores assigned by the evaluation metric under different retention rates $q$ based on a large number of independent experiments, where the area of the red region is the discriminability $d$ for the evaluation metric.
  • Figure 3: The average discriminabilities of evaluation metrics with the varying threshold $p^\ast$. The results are obtained by averaging over 20 link prediction algorithms and 131 real networks. Here, we set $T=100$.
  • Figure 4: The rankings and correlations of discriminabilities of evaluation metrics across different domains and link prediction algorithms. In (a) and (d), each column represents one domain or one algorithm, with the numbers on the left indicating the ranks of the discriminabilities within the column. (a) shows the rankings of average discriminabilities for the six domains over the 20 considered algorithms, where Bio., Eco., Inf., Soc., Tec. and Tra. represent the domains of biology, economics, information, sociology, technology and transportation, respectively. (d) displays the rankings of the average discriminabilities for six selected algorithms over the 131 real networks. (b) illustrates the gray correlation coefficients for any two domains using the RA index. (d) shows the gray correlation coefficients for all algorithm pairs from the six selected algorithms. In both cases, darker colors indicate higher correlations. Here, we set $p^{\ast}=0.01$ and $T=100$.