A Community Detection and Graph Neural Network Based Link Prediction Approach for Scientific Literature
Chunjiang Liu, Yikun Han, Haiyun Xu, Shihan Yang, Kaidi Wang, Yongye Su
TL;DR
The paper tackles link prediction in scientific literature networks by marrying Louvain-based community detection with Graph Neural Networks, producing robust gains across GNN architectures such as GAT, GCN, and GraphSAGE. By uncovering latent community structure and augmenting node features with community labels, the authors achieve consistent improvements in predictive metrics, notably elevating AUC (e.g., GCNv2 with Louvain from 0.892 to 0.919) and demonstrating strong performance across heuristic, ML, and GNN baselines. The approach is evaluated on a zinc-battery literature corpus (10,187 papers), with an 80:20 train-test split and balanced non-links, highlighting practical applicability to large-scale, real-world networks. The work suggests that incorporating community-level information can significantly enhance link prediction, with potential implications for paper recommendations, collaboration forecasting, and broader network analysis domains.
Abstract
This study presents a novel approach that synergizes community detection algorithms with various Graph Neural Network (GNN) models to bolster link prediction in scientific literature networks. By integrating the Louvain community detection algorithm into our GNN frameworks, we consistently enhance performance across all models tested. For example, integrating Louvain with the GAT model resulted in an AUC score increase from 0.777 to 0.823, exemplifying the typical improvements observed. Similar gains are noted when Louvain is paired with other GNN architectures, confirming the robustness and effectiveness of incorporating community-level insights. This consistent uplift in performance reflected in our extensive experimentation on bipartite graphs of scientific collaborations and citations highlights the synergistic potential of combining community detection with GNNs to overcome common link prediction challenges such as scalability and resolution limits. Our findings advocate for the integration of community structures as a significant step forward in the predictive accuracy of network science models, offering a comprehensive understanding of scientific collaboration patterns through the lens of advanced machine learning techniques.
