Table of Contents
Fetching ...

EC-SBM Synthetic Network Generator

The-Anh Vu-Le, Lahari Anne, George Chacko, Tandy Warnow

TL;DR

This study proposes a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM), which aims to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria.

Abstract

Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria. In particular, we focus on simulating the internal edge connectivity of the clusters in the reference clustered network. Our extensive performance study on large real-world networks shows that EC-SBM has high accuracy in both network and community-specific criteria, and is generally more accurate than current alternative approaches for this problem. Furthermore, EC-SBM is fast enough to scale to real-world networks with millions of nodes.

EC-SBM Synthetic Network Generator

TL;DR

This study proposes a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM), which aims to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria.

Abstract

Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria. In particular, we focus on simulating the internal edge connectivity of the clusters in the reference clustered network. Our extensive performance study on large real-world networks shows that EC-SBM has high accuracy in both network and community-specific criteria, and is generally more accurate than current alternative approaches for this problem. Furthermore, EC-SBM is fast enough to scale to real-world networks with millions of nodes.

Paper Structure

This paper contains 28 sections, 1 theorem, 7 figures, 2 tables.

Key Result

Theorem 1

Let $\lambda(G_n)$ be the minimum edge-cut size of $G_n$. Then, $\lambda(G_n) \geq k$.

Figures (7)

  • Figure 1: Proportion of disconnected clusters (top) and excess edges (bottom) in synthetic networks generated by SBM. SBM is given network and clustering statistics for $74$ networks, each clustered by one of the clustering methods specified on the horizontal axis. These clusterings are all guaranteed to produce connected clusters. SBM then uses these parameters to produce a synthetic network. This figure shows that while the choice of clustering method affects the proportion of clusters that are internally disconnected and the overall proportion of excess edges, SBM produces many disconnected clusters and many excess edges for all tested clusterings.
  • Figure 2: Stage 1 of EC-SBM: Generation of the synthetic clustered subnetwork. The empirical cluster assignment is maintained as the synthetic cluster assignment. In Step 1, we generate for each cluster a $k$-edge-connected subnetwork where $k$ is the desired edge connectivity of that cluster. In Step 2a, we generate the remaining edges according to the updated parameters using SBM; this can result in parallel edges and self-loops (dashed). In Step 2b, we remove the excessive edges to obtain the final output.
  • Figure 3: Step 1 of Stage 1 of EC-SBM: Generation of a $k$-edge-connected subnetwork. In the first $k+1$ steps, we construct a $(k+1)$-clique. From Step $k+2$ up to $n$, we process the remaining vertices sequentially by making each vertex adjacent to $k$ previously processed vertices.
  • Figure 4: The evaluation process of synthetic network generators. An empirical clustering is obtained using a community detection method on the empirical network. Using the empirical network and clustering, the generator generates a synthetic network and clustering. Various statistics can be computed and compared between the empirical and synthetic pair. The distance computed quantifies the generator's performance and can be used to compare between generators.
  • Figure 5: Impact of input clustering on the similarity between the clustered input network and synthetic network. The comparison is done on 74 networks with respect to 4 different statistics (see Table \ref{['tab:statistics']}). A distance value closer to $0.0$ is preferred. For each simulator, there is no single best input clustering across all statistics. For SBM and EC-SBM, SBM+WCC has an advantage when we look at the pseudo-diameter and the characteristic time. For RECCS, SBM+WCC and Leiden-Mod+CM are comparable, with the former performing well on the characteristic time and the latter performing well on the others.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof