Table of Contents
Fetching ...

The Artificial Benchmark for Community Detection with Outliers and Overlapping Communities (ABCD+$o^2$)

Jordan Barrett, Ryan DeWolfe, Bogumił Kamiński, Paweł Prałat, Aaron Smith, François Théberge

TL;DR

This paper addresses the need for scalable, analytically tractable benchmarks for overlapping community detection with outliers. It introduces ABCD+o^2, a six-phase generative framework that uses a hidden geometric reference layer to produce overlapping communities and a background edge process, all governed by power-law degree and community-size distributions. The authors validate the model against real networks (DBLP, Amazon, YouTube) and show controllable overlap, degree-communities correlation, and density properties, while demonstrating its utility for benchmarking through comparative experiments with multiple algorithms. The work provides a flexible, fast, and interpretable benchmark that supports systematic study of overlap and noise in community detection.

Abstract

The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster, more interpretable, and can be investigated analytically. In this paper, we use the underlying ingredients of the ABCD model, and its generalization to include outliers (ABCD+$o$), and introduce another variant that allows for overlapping communities, ABCD+$o^2$.

The Artificial Benchmark for Community Detection with Outliers and Overlapping Communities (ABCD+$o^2$)

TL;DR

This paper addresses the need for scalable, analytically tractable benchmarks for overlapping community detection with outliers. It introduces ABCD+o^2, a six-phase generative framework that uses a hidden geometric reference layer to produce overlapping communities and a background edge process, all governed by power-law degree and community-size distributions. The authors validate the model against real networks (DBLP, Amazon, YouTube) and show controllable overlap, degree-communities correlation, and density properties, while demonstrating its utility for benchmarking through comparative experiments with multiple algorithms. The work provides a flexible, fast, and interpretable benchmark that supports systematic study of overlap and noise in community detection.

Abstract

The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster, more interpretable, and can be investigated analytically. In this paper, we use the underlying ingredients of the ABCD model, and its generalization to include outliers (ABCD+), and introduce another variant that allows for overlapping communities, ABCD+.

Paper Structure

This paper contains 14 sections, 10 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Example of assigning the community $C_3$ (Primary 2 in the legend) with $\hat{s}_3 = 40$ and $\eta = 1.75$ using a 2 dimensional reference layer. First, we select the element furthest from the origin that does not have a primary community as the seed. Next, we assign $C_3$ as the primary community of the seed, as well as the $\hat{s}_3 - 1$ nearest neighbours of the seed that do not yet have a primary community. Finally, we expand $C_3$ by a factor of $\eta$ (from $40$ to $40 \cdot 1.75 = 70$ nodes) by taking the nearest neighbours to the primary center of mass.
  • Figure 2: Distribution of community sizes for the three real-world networks and their synthetic counterparts. The y-axis shows the empirical complementary cumulative distribution function, computed as $y = 1 - |\{C_i : |C_i| < x\}| \; / \; |\{C_i\}|$.
  • Figure 3: Distribution of the number of communities per node. The y-axis shows the empirical complementary cumulative distribution function, computed as $y = 1 - |\{v_i : |\{C_j : v_i \in C_j\} < x\}| \; / \; |V|$.
  • Figure 4: Distribution of the size of overlaps. The y-axis shows the empirical complementary cumulative distribution function of non-empty overlaps. For example, the center column of figures corresponds to three overlapping communities and is computed as $y = 1 - |\{ \{C_i, C_j, C_k\} : |C_i \cap C_j \cap C_k| < x\}| \; / \; |\{ \{C_i, C_j, C_k\} : |C_i \cap C_j \cap C_k| \neq \emptyset|$.
  • Figure 5: Community overlap densities compared to individual community densities for various values of $\rho$. For 9 ABCD+o$^2$ graphs with $\rho$ evenly spaced in $[-0.5, 0.5]$, we compute the empirical value $\hat{\rho}$ and the distribution of density for overlapping communities. We plot $\hat{\rho}$ against the distribution of the density for the intersection and each community individually. The line corresponds to the median, and the shaded region to the 25th to 75th quantile. For computational reasons, we only consider overlaps of size at least 25, and to ensure we are not simply measuring the density of the smaller community, the size of the overlap must be at most half the size of the smaller community.
  • ...and 11 more figures