Table of Contents
Fetching ...

Counting communities in weighted Stochastic Block Models via semidefinite programming

Deborah Oliveira, Andressa Cerqueira, Roberto Oliveira

TL;DR

This work addresses estimating the number of communities in weighted, balanced SBMs by developing SDP-based hypothesis tests and estimators. A key novelty is a universality result that replaces the SDP functional of centered, sub-gamma weight matrices with a GOE surrogate, enabling precise thresholding and decision rules for distinguishing candidate numbers of communities and recovering memberships. The approach yields consistent sequential estimation of K and provides partial recovery guarantees for community assignments in both two-community and multi-community cases, under explicit mean-gap and weight-variance conditions. An illustrative zero-inflated Gaussian model demonstrates the method’s applicability, and simulations show SDP-based methods outperform several alternatives in sparse-to-moderate regimes. Overall, the findings extend SDP-based community detection to weighted SBMs and establish a practical, theoretically grounded pathway for learning the true number of communities in complex networks.

Abstract

We consider the problem of estimating the number of communities in a weighted balanced Stochastic Block Model. We construct hypothesis tests based on semidefinite programming and with a statistic coming from a GOE matrix to distinguish between any two candidate numbers of communities. This is possible due to a universality result for a semidefinite programming-based function that we also prove. The tests are then used to form a sequential test to estimate the number of communities. Furthermore, we also construct estimators of the communities themselves.

Counting communities in weighted Stochastic Block Models via semidefinite programming

TL;DR

This work addresses estimating the number of communities in weighted, balanced SBMs by developing SDP-based hypothesis tests and estimators. A key novelty is a universality result that replaces the SDP functional of centered, sub-gamma weight matrices with a GOE surrogate, enabling precise thresholding and decision rules for distinguishing candidate numbers of communities and recovering memberships. The approach yields consistent sequential estimation of K and provides partial recovery guarantees for community assignments in both two-community and multi-community cases, under explicit mean-gap and weight-variance conditions. An illustrative zero-inflated Gaussian model demonstrates the method’s applicability, and simulations show SDP-based methods outperform several alternatives in sparse-to-moderate regimes. Overall, the findings extend SDP-based community detection to weighted SBMs and establish a practical, theoretically grounded pathway for learning the true number of communities in complex networks.

Abstract

We consider the problem of estimating the number of communities in a weighted balanced Stochastic Block Model. We construct hypothesis tests based on semidefinite programming and with a statistic coming from a GOE matrix to distinguish between any two candidate numbers of communities. This is possible due to a universality result for a semidefinite programming-based function that we also prove. The tests are then used to form a sequential test to estimate the number of communities. Furthermore, we also construct estimators of the communities themselves.

Paper Structure

This paper contains 18 sections, 25 theorems, 298 equations, 12 figures, 1 table.

Key Result

Theorem A

In the setting above, if the Type I and Type II errors of the test $T(X;\delta)$ approach zero asymptotically.

Figures (12)

  • Figure 1: Estimation error computed for different community detection approaches as function of the sparsity parameter of the model $\rho$ with $\mu_{in}-\mu_{out}=3$.
  • Figure 2: Estimation error computed for different community detection approaches as function of the difference of the mean weights $\mu_{in}-\mu_{out}$ with $\rho=0.4$.
  • Figure 3: The mean and the one standard deviation error bars for the estimated number of communities as function of the sparsity parameter for $| \mu_{in} - \mu_{out}| =4$. (A) and (B) model with three communities and (C) and (D) model with four communities.
  • Figure 4: The mean and the one standard deviation error bars for the estimated number of communities as function of the difference of the mean weights $\mu_{in}-\mu_{out}$ . (A) and (B) model with three communities with $\rho=0.5$ and (C) and (D) model with four communities with $\rho=0.8$.
  • Figure 5: Configuration $\mathcal{C}$ in the case $r=6$, $s=3$ and $m=2$
  • ...and 7 more figures

Theorems & Definitions (69)

  • Definition A
  • Theorem A
  • Theorem B
  • Theorem C
  • Theorem D
  • Theorem E
  • Theorem F
  • Remark 1
  • Remark 2
  • Theorem 1
  • ...and 59 more