Table of Contents
Fetching ...

Hierarchical community detection benchmark for heterogeneous inter-community connectivity

Brendan Cross, Boleslaw K. Szymanski

TL;DR

The paper tackles the need for robust benchmarks that capture hierarchical community structure and inter-community heterogeneity to stress-test modularity-based community detection methods against the resolution limit. It introduces the Hierarchical Generalized LFR (HGLFR), extending the LFR/GLFR framework with multiple hierarchy levels and level-specific mixing parameters, controlled by $L$, $\mu_L$, $\Delta_{\mu_L}$, and $S$. The approach preserves core distributional properties while enabling inter-community heterogeneity and potential resolution-limit phenomena, demonstrated through validation against LFR/GLFR and analysis of detectability across hierarchy levels. The work provides a more realistic benchmark for evaluating detection algorithms in multi-scale networks and highlights areas for further refinement in degree-heterogeneity control and edge assignment across hierarchical levels.

Abstract

Here, we introduce a new tool for community detection, a generator of networks, which uses parameters to control the structure of created networks. Typically, network scientists designing novel community detection algorithms use synthetically generated benchmarks with community structures that they intend to detect and scale the benchmark networks across size and density. Currently, available benchmarks use generators limited to the properties of the LFR and GLFR networks. We improve on these previous benchmarks with a new hierarchical benchmark, the HGLFR, that preserves the properties of the LFR and GLFR while extending them to include heterogeneous inter-community connectivity. Networks generated by this benchmark are shown to produce networks with structures triggering the resolution limit while maintaining assortative connectivity.

Hierarchical community detection benchmark for heterogeneous inter-community connectivity

TL;DR

The paper tackles the need for robust benchmarks that capture hierarchical community structure and inter-community heterogeneity to stress-test modularity-based community detection methods against the resolution limit. It introduces the Hierarchical Generalized LFR (HGLFR), extending the LFR/GLFR framework with multiple hierarchy levels and level-specific mixing parameters, controlled by , , , and . The approach preserves core distributional properties while enabling inter-community heterogeneity and potential resolution-limit phenomena, demonstrated through validation against LFR/GLFR and analysis of detectability across hierarchy levels. The work provides a more realistic benchmark for evaluating detection algorithms in multi-scale networks and highlights areas for further refinement in degree-heterogeneity control and edge assignment across hierarchical levels.

Abstract

Here, we introduce a new tool for community detection, a generator of networks, which uses parameters to control the structure of created networks. Typically, network scientists designing novel community detection algorithms use synthetically generated benchmarks with community structures that they intend to detect and scale the benchmark networks across size and density. Currently, available benchmarks use generators limited to the properties of the LFR and GLFR networks. We improve on these previous benchmarks with a new hierarchical benchmark, the HGLFR, that preserves the properties of the LFR and GLFR while extending them to include heterogeneous inter-community connectivity. Networks generated by this benchmark are shown to produce networks with structures triggering the resolution limit while maintaining assortative connectivity.

Paper Structure

This paper contains 10 sections, 1 equation, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1:
  • Figure 2: Sampled community hierarchy. In this figure we highlight the how the levels of the hierarchy are generated. We begin with a set of communities comprised of nodes with prescribed degree, but with no connections assigned. For each community we calculate the required external degree to each level of the hierarchy to match the level connectivity parameter. We then perform a random merging step, beginning from the ground truth communities. Looping over each community we merge it to an existing community with probability S. After groupings are assigned, we perform a shuffling step where communities are switched to hierarchical groups based on their ability to satisfy their necessary external connectivity.
  • Figure 3: Generator degree sequences. The LFR, GLFR, and HGLFR generators generate degree distributions with reference sampled power law.
  • Figure 4: Generated networks by resolution window distance. This figure shows the resolution window distance, $D$, for networks generated by the LFR (a), the GLFR (b) and the HGLFR (c) as a function $\mu$ and the minimum intra-community connectivity $min(\Omega_{i,i})$. The green line indicates the boundary of assortative network communities ($\mu = 0.5$ or $\Omega_{i,i} < 1$). The red line indicates the boundary of negative distance.
  • Figure 5: Community detection algorithm performance by $\mu$
  • ...and 2 more figures