Improved Community Detection using Stochastic Block Models
Minhyuk Park, Daniel Wang Feng, Siya Digra, The-Anh Vu-Le, Lahari Anne, George Chacko, Tandy Warnow
TL;DR
The paper investigates edge connectivity in community detection with Stochastic Block Models (SBMs), revealing that SBMs frequently yield internally disconnected communities on real networks. It introduces Well-Connected Clusters (WCC), a post-processing method that iteratively removes small edge cuts to enforce well-connectedness, and compares it to Connectivity Modifier (CM) and simple Connected Components (CC). Across large-scale synthetic networks (LFR and RECCS) and real networks, SBM+WCC generally improves clustering accuracy (ARI/NMI/AGRI/RMI) while remaining scalable to networks with millions of nodes, whereas CM shows mixed effects and CC can reduce node coverage. The authors further explain why Degree Corrected SBM drives disconnections and why WCC outperforms CM, linking behavior to the description-length objective, and provide an open-source implementation for practical use.
Abstract
Identifying edge-dense communities that are also well-connected is an important aspect of understanding community structure. Prior work has shown that community detection methods can produce poorly connected communities, and some can even produce internally disconnected communities. In this study we evaluate the connectivity of communities obtained using Stochastic Block Models. We find that SBMs produce internally disconnected communities from real-world networks. We present a simple technique, Well-Connected Clusters (WCC), which repeatedly removes small edge cuts until the communities meet a user-specified threshold for well-connectivity. Our study using a large collection of synthetic networks based on clustered real-world networks shows that using WCC as a post-processing tool with SBM community detection typically improves clustering accuracy. WCC is fast enough to use on networks with millions of nodes and is freely available in open source form.
