Table of Contents
Fetching ...

BASIC: Bipartite Assisted Spectral-clustering for Identifying Communities in Large-scale Networks

Tianchen Gao, Jingyuan Liu, Rui Pan, Ao Sun

TL;DR

Bipartite Assisted Spectral-clustering for Identifying Communities (BASIC) addresses the challenge of detecting communities in large networks when signals are weak. It couples the primary network with multiple bipartite networks via an aggregated squared matrix M = A A^{T} + sum_{q=1}^Q B^{(q)} B^{(q)T} and applies SCORE-normalized spectral clustering to obtain robust community labels, mitigating degree heterogeneity. The authors provide non-asymptotic mis-clustering bounds, show that the integrated SNR across all networks improves learning performance, and demonstrate that BASIC never performs worse than using the primary network alone. Empirically, BASIC is validated through simulations and a real Web of Science-derived author collaboration dataset, where incorporating author-paper, author-institution, and author-region information yields more balanced and interpretable communities. The work advances multi-layer network analysis by formally leveraging bipartite side information to enhance community detection under weak signals with practical applicability to large-scale scientific collaboration networks.

Abstract

Community detection, which focuses on recovering the group structure within networks, is a crucial and fundamental task in network analysis. However, the detection process can be quite challenging and unstable when community signals are weak. Motivated by a newly collected large-scale academic network dataset from the Web of Science, which includes multi-layer network information, we propose a Bipartite Assisted Spectral-clustering approach for Identifying Communities (BASIC), which incorporates the bipartite network information into the community structure learning of the primary network. The accuracy and stability enhancement of BASIC is validated theoretically on the basis of the degree-corrected stochastic block model framework, as well as numerically through extensive simulation studies. We rigorously study the convergence rate of BASIC even under weak signal scenarios and prove that BASIC yields a tighter upper error bound than that based on the primary network information alone. We utilize the proposed BASIC method to analyze the newly collected large-scale academic network dataset from statistical papers. During the author collaboration network structure learning, we incorporate the bipartite network information from author-paper, author-institution, and author-region relationships. From both statistical and interpretative perspectives, these bipartite networks greatly aid in identifying communities within the primary collaboration network.

BASIC: Bipartite Assisted Spectral-clustering for Identifying Communities in Large-scale Networks

TL;DR

Bipartite Assisted Spectral-clustering for Identifying Communities (BASIC) addresses the challenge of detecting communities in large networks when signals are weak. It couples the primary network with multiple bipartite networks via an aggregated squared matrix M = A A^{T} + sum_{q=1}^Q B^{(q)} B^{(q)T} and applies SCORE-normalized spectral clustering to obtain robust community labels, mitigating degree heterogeneity. The authors provide non-asymptotic mis-clustering bounds, show that the integrated SNR across all networks improves learning performance, and demonstrate that BASIC never performs worse than using the primary network alone. Empirically, BASIC is validated through simulations and a real Web of Science-derived author collaboration dataset, where incorporating author-paper, author-institution, and author-region information yields more balanced and interpretable communities. The work advances multi-layer network analysis by formally leveraging bipartite side information to enhance community detection under weak signals with practical applicability to large-scale scientific collaboration networks.

Abstract

Community detection, which focuses on recovering the group structure within networks, is a crucial and fundamental task in network analysis. However, the detection process can be quite challenging and unstable when community signals are weak. Motivated by a newly collected large-scale academic network dataset from the Web of Science, which includes multi-layer network information, we propose a Bipartite Assisted Spectral-clustering approach for Identifying Communities (BASIC), which incorporates the bipartite network information into the community structure learning of the primary network. The accuracy and stability enhancement of BASIC is validated theoretically on the basis of the degree-corrected stochastic block model framework, as well as numerically through extensive simulation studies. We rigorously study the convergence rate of BASIC even under weak signal scenarios and prove that BASIC yields a tighter upper error bound than that based on the primary network information alone. We utilize the proposed BASIC method to analyze the newly collected large-scale academic network dataset from statistical papers. During the author collaboration network structure learning, we incorporate the bipartite network information from author-paper, author-institution, and author-region relationships. From both statistical and interpretative perspectives, these bipartite networks greatly aid in identifying communities within the primary collaboration network.

Paper Structure

This paper contains 23 sections, 8 theorems, 44 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Let $\boldsymbol{\Omega}_{M} = \mathbf{U} \boldsymbol{\Lambda} \mathbf{U}^{\top}$ be the compact eigenvalue decomposition of $\boldsymbol{\Omega}_{M}$, then the $i$-th leading eigenvalue of $\boldsymbol{\Omega}_{M}$ is Further let $\bar{\mathbf{S}}=\mathbf{J} \boldsymbol{\Sigma} \mathbf{J}^{\top}$ be the eigenvalue decompositions of $\bar{\mathbf{S}}$. Then the $i$-th row of eigenvectors of $\bol

Figures (8)

  • Figure 1: Schematic diagram of the collaboration network, author-institution network, and author-region network. The structural information of author nodes within these bipartite networks can be leveraged to aid in detecting communities within the collaboration network.
  • Figure 2: ARI for weak-signal primary networks. The baseline refers to the results obtained by using only the primary network. Cases 1, 2, 3, and 4 correspond to using 0, 1, 2, and 3 strong bipartite networks, respectively.
  • Figure 3: ARI for strong-signal primary networks. The baseline refers to the results obtained by using only the primary network. Cases 1, 2, 3, and 4 correspond to using 0, 1, 2, and 3 bipartite networks with the same signal strength as the primary network, respectively.
  • Figure 4: The largest connected component of Community 2 in the collaboration network.
  • Figure 5: The largest connected component of Community 5 in the collaboration network.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Proposition 1
  • Lemma 1
  • Theorem 1
  • Lemma 2
  • Proposition 2
  • Lemma 3
  • Lemma 4
  • Lemma 5