Table of Contents
Fetching ...

RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation

Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, George Chacko

TL;DR

RECCS addresses the SBM limitation of producing disconnected ground-truth clusters by introducing a two-phase connectivity correction: (i) model the clustered subnetwork with a degree‑corrected SBM constrained to match $Param(G_c,\mathcal{C})$ and enforce $k(C)$-bounded connectivity via iterative minimum cuts, and (ii) model outliers as singleton clusters and merge the two parts into a single synthetic network. The approach yields two variants focusing on the clustered-subnetwork modeling and demonstrates improved fidelity to real-world networks (up to $13.989\times 10^{6}$ nodes) compared with naive SBM generation, based on replication studies and runtime metrics. The paper also evaluates clustering recovery on RECCS-generated synthetics using multiple real networks (development and testing sets) and various clustering methods, highlighting the utility of RECCS for benchmarking and method selection. Overall, RECCS provides a scalable toolkit for generating realistic synthetic networks with well-connected clusters, aiding ground-truth evaluation and comparison of community detection strategies.

Abstract

The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.

RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation

TL;DR

RECCS addresses the SBM limitation of producing disconnected ground-truth clusters by introducing a two-phase connectivity correction: (i) model the clustered subnetwork with a degree‑corrected SBM constrained to match and enforce -bounded connectivity via iterative minimum cuts, and (ii) model outliers as singleton clusters and merge the two parts into a single synthetic network. The approach yields two variants focusing on the clustered-subnetwork modeling and demonstrates improved fidelity to real-world networks (up to nodes) compared with naive SBM generation, based on replication studies and runtime metrics. The paper also evaluates clustering recovery on RECCS-generated synthetics using multiple real networks (development and testing sets) and various clustering methods, highlighting the utility of RECCS for benchmarking and method selection. Overall, RECCS provides a scalable toolkit for generating realistic synthetic networks with well-connected clusters, aiding ground-truth evaluation and comparison of community detection strategies.

Abstract

The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.

Paper Structure

This paper contains 13 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: RECCS Overall Workflow. The workflow modifies an initial network by adding edges to improve its fit to the input parameters. The input to RECCS consists of parameters derived from a real-world network $N$ and its estimated clustering. Step 1 generates a synthetic network representing the clustered subnetwork of $N$, while Step 2 generates a network modeling the "outlier" nodes. The two networks are then merged in Step 3. For more details, see Section \ref{['sec:reccs-workflow']}.