RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation
Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, George Chacko
TL;DR
RECCS addresses the SBM limitation of producing disconnected ground-truth clusters by introducing a two-phase connectivity correction: (i) model the clustered subnetwork with a degree‑corrected SBM constrained to match $Param(G_c,\mathcal{C})$ and enforce $k(C)$-bounded connectivity via iterative minimum cuts, and (ii) model outliers as singleton clusters and merge the two parts into a single synthetic network. The approach yields two variants focusing on the clustered-subnetwork modeling and demonstrates improved fidelity to real-world networks (up to $13.989\times 10^{6}$ nodes) compared with naive SBM generation, based on replication studies and runtime metrics. The paper also evaluates clustering recovery on RECCS-generated synthetics using multiple real networks (development and testing sets) and various clustering methods, highlighting the utility of RECCS for benchmarking and method selection. Overall, RECCS provides a scalable toolkit for generating realistic synthetic networks with well-connected clusters, aiding ground-truth evaluation and comparison of community detection strategies.
Abstract
The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.
