Uncovering multi-order Popularity and Similarity Mechanisms in Link Prediction by graphlet predictors
Yong-Jian He, Yijun Ran, Zengru Di, Tao Zhou, Xiao-Ke Xu
TL;DR
This work introduces graphlet orbit degrees as a unified, multi-order representation of popularity and similarity mechanisms for link prediction. By representing traditional indices through node- and edge-orbit degrees and fusing them with XGBoost, the proposed OD framework achieves state-of-the-art performance across 550 real-world networks from six domains, while also enabling interpretability via SHAP analyses. The results reveal dominant roles for first-order similarity (notably M2 in social networks) and domain-specific patterns (e.g., M3 in economic/tech/info networks) with no single feature dominating biological or transportation networks. Overall, the approach provides both higher predictive accuracy and deeper mechanistic insights into how network structure drives link formation, with broad applicability to network analysis tasks beyond link prediction.
Abstract
Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to construct multi-order popularity- and similarity-based network link predictors, demonstrating that traditional popularity- and similarity-based indices can be efficiently represented in terms of orbit degrees. Moreover, we designed a supervised learning model that fuses multiple orbit-degree-based features and validated its link prediction performance. We also evaluated the mean absolute Shapley additive explanations of each feature within this model across 550 real-world networks from six domains. We observed that the homophily mechanism, which is a similarity-based feature, dominated social networks, with its win rate being 91\%. Moreover, a different similarity-based feature was prominent in economic, technological, and information networks. Finally, no single feature dominated the biological and transportation networks. The proposed approach improves the accuracy and interpretability of link prediction, thus facilitating the analysis of complex networks.
