Table of Contents
Fetching ...

Balanced Stochastic Block Model for Community Detection in Signed Networks

Yichao Chen, Weijing Tang, Ji Zhu

TL;DR

The paper tackles community detection in signed networks by introducing the Balanced Stochastic Block Model (BSBM), which imposes a population-level balance via a two-meta-group hierarchy on community signs. It develops a fast profile-pseudo likelihood EM-based estimator that decouples row and column labels, with a constrained update for the sign-probability matrix implemented through a max-cut optimization, and proves strong consistency under weaker signal conditions than unsigned SBMs. Through extensive simulations, BSBM demonstrates superior performance when edge connectivity signals are weak but sign information is informative, and across varying meta-group sizes, network scales, and numbers of communities. Real-world applications to international relations and protein interaction networks show meaningful, interpretable communities aligned with known structures and biological pathways, underscoring the practical impact of leveraging balance theory in signed networks.

Abstract

Community detection, discovering the underlying communities within a network from observed connections, is a fundamental problem in network analysis, yet it remains underexplored for signed networks. In signed networks, both edge connection patterns and edge signs are informative, and structural balance theory (e.g., triangles aligned with ``the enemy of my enemy is my friend'' and ``the friend of my friend is my friend'' are more prevalent) provides a global higher-order principle that guides community formation. We propose a Balanced Stochastic Block Model (BSBM), which incorporates balance theory into the network generating process such that balanced triangles are more likely to occur. We develop a fast profile pseudo-likelihood estimation algorithm with provable convergence and establish that our estimator achieves strong consistency under weaker signal conditions than methods for the binary SBM that rely solely on edge connectivity. Extensive simulation studies and two real-world signed networks demonstrate strong empirical performance.

Balanced Stochastic Block Model for Community Detection in Signed Networks

TL;DR

The paper tackles community detection in signed networks by introducing the Balanced Stochastic Block Model (BSBM), which imposes a population-level balance via a two-meta-group hierarchy on community signs. It develops a fast profile-pseudo likelihood EM-based estimator that decouples row and column labels, with a constrained update for the sign-probability matrix implemented through a max-cut optimization, and proves strong consistency under weaker signal conditions than unsigned SBMs. Through extensive simulations, BSBM demonstrates superior performance when edge connectivity signals are weak but sign information is informative, and across varying meta-group sizes, network scales, and numbers of communities. Real-world applications to international relations and protein interaction networks show meaningful, interpretable communities aligned with known structures and biological pathways, underscoring the practical impact of leveraging balance theory in signed networks.

Abstract

Community detection, discovering the underlying communities within a network from observed connections, is a fundamental problem in network analysis, yet it remains underexplored for signed networks. In signed networks, both edge connection patterns and edge signs are informative, and structural balance theory (e.g., triangles aligned with ``the enemy of my enemy is my friend'' and ``the friend of my friend is my friend'' are more prevalent) provides a global higher-order principle that guides community formation. We propose a Balanced Stochastic Block Model (BSBM), which incorporates balance theory into the network generating process such that balanced triangles are more likely to occur. We develop a fast profile pseudo-likelihood estimation algorithm with provable convergence and establish that our estimator achieves strong consistency under weaker signal conditions than methods for the binary SBM that rely solely on edge connectivity. Extensive simulation studies and two real-world signed networks demonstrate strong empirical performance.
Paper Structure (22 sections, 4 theorems, 27 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 4 theorems, 27 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

For $\hbox{A} \sim BSBM$, $\mathbb{E}(A_{ij}A_{jk}A_{ki}\mid |A_{ij}A_{jk}A_{ki}|=1)>0$ for any $1\le i<j<k\le n$.

Figures (10)

  • Figure 1: The left two are balanced triangles and the right two are unbalanced triangles
  • Figure 2: The network contains three communities: nodes labeled 1 to 4, nodes labeled 5 to 8, and nodes labeled 9 to 12. Nodes 1–4 and 9–12 belong to meta-group $G_1$, and nodes 5–8 belong to meta-group $G_2$. Left: Community detection under a vanilla SBM using unsigned connectivity (grey edges). Right: Community detection under the BSBM using signed edges (positive in green and negative in red).
  • Figure 3: Comparisons of NMI scores across five methods under six simulation settings. Panels (a)-(f), from top left to bottom right, correspond to: (a) varying within-community probabilities $P_{in}$ with fixed between-community probabilities $P_{bt} = 0.07$; (b) varying the size of meta-groups; (c) varying the magnitude of $\eta$'s when $K = 2$; (d) varying the magnitude of $\eta$'s when $K = 3$; (e) varying the number of nodes $n$; and (f) varying the number of communities $K$.
  • Figure 4: World map showing eight communities identified by our method, with each color representing one community: (1) Germany, Italy, Japan; (2) Hungary, Bulgaria, Romania; (3) Canada, United Kingdom, Yugoslavia, Greece, Russia, Ethiopia, South Africa, China, Australia, New Zealand; (4) United States, Haiti, Nicaragua; (5) France, Spain, Finland; (6) Cuba, Dominican Republic, Mexico, Guatemala, Honduras, El Salvador, Costa Rica, Panama, Colombia, Venezuela, Ecuador, Peru, Brazil, Bolivia, Paraguay, Chile, Argentina, Uruguay; (7) Portugal, Turkey, Thailand; (8) Switzerland, Sweden, Iran, Iraq, Egypt, Saudi Arabia, Yemen Arab Republic, Afghanistan, Mongolia. Different colors represent different communities.
  • Figure 5: Density plot comparing the distribution of protein ratios obtained from our method and the PPL method. The protein ratio ($M_1/M_2$) for a given community represents the fraction of proteins ($M_1$) annotated to the enriched pathway out of the total number of proteins in that community ($M_2$). Higher ratios indicate more coherent and functionally homogeneous communities.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Remark 1
  • Corollary 2.1