Table of Contents
Fetching ...

Enhancing Stability and Assessing Uncertainty in Community Detection through a Consensus-based Approach

Fabio Morea, Domenico De Stefano

TL;DR

CCD introduces a consensus-based framework to stabilize and interpret community detection results by exploiting multiple partitions obtained on randomly permuted graphs $G^*$. It quantifies node-level uncertainty via an uncertainty coefficient $\gamma$ and mitigates input-ordering bias, producing a final representation $\widetilde{C^A}$ consisting of community labels and $\gamma$ values. The method is compatible with arbitrary base algorithms and uses a co-occurrence matrix $D$, a similarity threshold $q$, and a block-threshold $p$ to prune and aggregate partitions. Evaluations on synthetic benchmarks (LFR and RC) show CCD improves repeatability (higher mean pairwise $NMI$ and stable $k$) and enables outlier handling through three strategies: incorporate, highlight, or group. The work advances interpretability and reliability of network community detection, with open-source code available at the provided repository.

Abstract

Complex data in social and natural sciences find effective representation through networks, wherein quantitative and categorical information can be associated with nodes and connecting edges. The internal structure of networks can be explored using unsupervised machine learning methods known as community detection algorithms. The process of community detection is inherently subject to uncertainty as algorithms utilize heuristic approaches and randomised procedures to explore vast solution spaces, resulting in non-deterministic outcomes and variability in detected communities across multiple runs. Moreover, many algorithms are not designed to identify outliers and may fail to take into account that a network is an unordered mathematical entity. The main aim of our work is to address these issues through a consensus-based approach by introducing a new framework called Consensus Community Detection (CCD). Our method can be applied to different community detection algorithms, allowing the quantification of uncertainty for the whole network as well as for each node, and providing three strategies for dealing with outliers: incorporate, highlight, or group. The effectiveness of our approach is evaluated on artificial benchmark networks.

Enhancing Stability and Assessing Uncertainty in Community Detection through a Consensus-based Approach

TL;DR

CCD introduces a consensus-based framework to stabilize and interpret community detection results by exploiting multiple partitions obtained on randomly permuted graphs . It quantifies node-level uncertainty via an uncertainty coefficient and mitigates input-ordering bias, producing a final representation consisting of community labels and values. The method is compatible with arbitrary base algorithms and uses a co-occurrence matrix , a similarity threshold , and a block-threshold to prune and aggregate partitions. Evaluations on synthetic benchmarks (LFR and RC) show CCD improves repeatability (higher mean pairwise and stable ) and enables outlier handling through three strategies: incorporate, highlight, or group. The work advances interpretability and reliability of network community detection, with open-source code available at the provided repository.

Abstract

Complex data in social and natural sciences find effective representation through networks, wherein quantitative and categorical information can be associated with nodes and connecting edges. The internal structure of networks can be explored using unsupervised machine learning methods known as community detection algorithms. The process of community detection is inherently subject to uncertainty as algorithms utilize heuristic approaches and randomised procedures to explore vast solution spaces, resulting in non-deterministic outcomes and variability in detected communities across multiple runs. Moreover, many algorithms are not designed to identify outliers and may fail to take into account that a network is an unordered mathematical entity. The main aim of our work is to address these issues through a consensus-based approach by introducing a new framework called Consensus Community Detection (CCD). Our method can be applied to different community detection algorithms, allowing the quantification of uncertainty for the whole network as well as for each node, and providing three strategies for dealing with outliers: incorporate, highlight, or group. The effectiveness of our approach is evaluated on artificial benchmark networks.
Paper Structure (14 sections, 4 equations, 11 figures, 1 algorithm)

This paper contains 14 sections, 4 equations, 11 figures, 1 algorithm.

Figures (11)

  • Figure 1: Variability of results of selected community detection algorithms on a LFR benchmark network with a nominal mixing parameter $\mu = 0.40$. Top: distribution of the number of communities. Middle: similarity between pairs of partitions. Bottom: scatterplot modularity and similarity.
  • Figure 2: Three alternative strategies to manage outliers: incorporate (left), highlight as single-node communities (center), or group into an outliers' community (right). The top row shows the network; the bottom row shows a graph of the communities, labeled with the number of nodes in each community.
  • Figure 3: An illustration of input-ordering bias, using a RC with $k_0 = 4$, $s = 5$ with bridges and a central node. Above: label assigned to the central node by various algorithms, applied $t=1000$ times to network $G$. Below: labels assigned to the central node applied to network $G^*$, a copy of $G$ randomly permuted at each iteration. Labels: S = the center is highlighted as a single-node community, $Ci$ = the center is incorporated in community $i$.
  • Figure 4: Stability of CCD results as a function of the number of iterations $t = (10, 20, 50, 100, 200, 500)$. Results of single trials $t = 1$ are highlighted in red. Test on a LFR network with $\mu = 0.3$, CCD parameters $p = 0.8$ and $q = 0.5$. Stability is measured by the similarity between pairs of solutions $S(C_i,C_j) = mean(NMI(C_i,C_j))$.
  • Figure 5: Example of CCD Zachary's Karate network (weighted). a) single trial of Louvain with resolution $r = 0.5$. b) single trial of LV, $r = 0.8$. c) single trial of LV, $r = 1.0$. d) CCD with $t = 100$ and $r = 0.5$ e) CCD with $t = 100$ and $r \in [0.5, 1.0]$. Uncertainty coefficient $\gamma$ is available only for CCD.
  • ...and 6 more figures