Improved Community Detection using Stochastic Block Models
Minhyuk Park, Daniel Wang Feng, Siya Digra, The-Anh Vu-Le, George Chacko, Tandy Warnow
TL;DR
This work investigates the tendency of stochastic block models (SBMs) to produce disconnected clusters on large real-world and synthetic networks. It introduces simple post-processing strategies—Connected Components (CC), Well-Connected Clusters (WCC), and the Connectivity Modifier (CM)—to enforce edge-connectivity and improve clustering quality. Across 122 real networks and numerous synthetic LFR benchmarks, CC and especially WCC enhance accuracy metrics such as ARI, AMI, and NMI, while maintaining reasonable coverage; CM is more variable and can hurt performance. The findings provide practical, low-complexity remedies to bolster SBM-based community detection in large-scale graphs, with WCC recommended as the default post-processing step.
Abstract
Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover community structure in graphs. In this study, we demonstrate that SBM software applied to various real-world and synthetic networks produces poorly-connected to disconnected clusters. We present simple modifications to improve the connectivity of SBM clusters, and show that the modifications improve accuracy using simulated networks.
