Machine Learning Informed by Micro and Mesoscopic Statistical Physics Methods for Community Detection
Yijun Ran, Junfan Yi, Wei Si, Michael Small, Ke-ke Shang
TL;DR
This work tackles the limitation of mesoscopic-only community detection by embedding micro-level node-pair similarities into mesoscopic structures using ensemble learning. The authors build a framework that samples first- and second-order node pairs, computes microscopic features (degree and clustering heterogeneity, common neighbors), and trains DT, RF, and XGBoost models to estimate pairwise similarity, which is squared and integrated into a weighted similarity network for final detection. Across artificial and real networks, the approach yields higher modularity $Q$ (and $Q^w$) and improved ground-truth alignment measured by $NMI$ and $ARI$, with the strongest gains when ground-truth labels are available; correlations between node-pair similarity and evaluation metrics reinforce the central premise. The results illustrate a productive synergy between machine learning and statistical-physics methods, offering a scalable, robust path to uncovering real-world community structures and suggesting a teacher-student dynamic where physics-guided insights guide learning and, in turn, learning informs refined physics-based detection.
Abstract
Community detection plays a crucial role in understanding the structural organization of complex networks. Previous methods, particularly those from statistical physics, primarily focus on the analysis of mesoscopic network structures and often struggle to integrate fine-grained node similarities. To address this limitation, we propose a low-complexity framework that integrates machine learning to embed micro-level node-pair similarities into mesoscopic community structures. By leveraging ensemble learning models, our approach enhances both structural coherence and detection accuracy. Experimental evaluations on artificial and real-world networks demonstrate that our framework consistently outperforms conventional methods, achieving higher modularity and improved accuracy in NMI and ARI. Notably, when ground-truth labels are available, our approach yields the most accurate detection results, effectively recovering real-world community structures while minimizing misclassifications. To further explain our framework's performance, we analyze the correlation between node-pair similarity and evaluation metrics. The results reveal a strong and statistically significant correlation, underscoring the critical role of node-pair similarity in enhancing detection accuracy. Overall, our findings highlight the synergy between machine learning and statistical physics, demonstrating how machine learning techniques can enhance network analysis and uncover complex structural patterns.
