BASIC: Bipartite Assisted Spectral-clustering for Identifying Communities in Large-scale Networks
Tianchen Gao, Jingyuan Liu, Rui Pan, Ao Sun
TL;DR
Bipartite Assisted Spectral-clustering for Identifying Communities (BASIC) addresses the challenge of detecting communities in large networks when signals are weak. It couples the primary network with multiple bipartite networks via an aggregated squared matrix M = A A^{T} + sum_{q=1}^Q B^{(q)} B^{(q)T} and applies SCORE-normalized spectral clustering to obtain robust community labels, mitigating degree heterogeneity. The authors provide non-asymptotic mis-clustering bounds, show that the integrated SNR across all networks improves learning performance, and demonstrate that BASIC never performs worse than using the primary network alone. Empirically, BASIC is validated through simulations and a real Web of Science-derived author collaboration dataset, where incorporating author-paper, author-institution, and author-region information yields more balanced and interpretable communities. The work advances multi-layer network analysis by formally leveraging bipartite side information to enhance community detection under weak signals with practical applicability to large-scale scientific collaboration networks.
Abstract
Community detection, which focuses on recovering the group structure within networks, is a crucial and fundamental task in network analysis. However, the detection process can be quite challenging and unstable when community signals are weak. Motivated by a newly collected large-scale academic network dataset from the Web of Science, which includes multi-layer network information, we propose a Bipartite Assisted Spectral-clustering approach for Identifying Communities (BASIC), which incorporates the bipartite network information into the community structure learning of the primary network. The accuracy and stability enhancement of BASIC is validated theoretically on the basis of the degree-corrected stochastic block model framework, as well as numerically through extensive simulation studies. We rigorously study the convergence rate of BASIC even under weak signal scenarios and prove that BASIC yields a tighter upper error bound than that based on the primary network information alone. We utilize the proposed BASIC method to analyze the newly collected large-scale academic network dataset from statistical papers. During the author collaboration network structure learning, we incorporate the bipartite network information from author-paper, author-institution, and author-region relationships. From both statistical and interpretative perspectives, these bipartite networks greatly aid in identifying communities within the primary collaboration network.
