Enhancing Stability and Assessing Uncertainty in Community Detection through a Consensus-based Approach
Fabio Morea, Domenico De Stefano
TL;DR
CCD introduces a consensus-based framework to stabilize and interpret community detection results by exploiting multiple partitions obtained on randomly permuted graphs $G^*$. It quantifies node-level uncertainty via an uncertainty coefficient $\gamma$ and mitigates input-ordering bias, producing a final representation $\widetilde{C^A}$ consisting of community labels and $\gamma$ values. The method is compatible with arbitrary base algorithms and uses a co-occurrence matrix $D$, a similarity threshold $q$, and a block-threshold $p$ to prune and aggregate partitions. Evaluations on synthetic benchmarks (LFR and RC) show CCD improves repeatability (higher mean pairwise $NMI$ and stable $k$) and enables outlier handling through three strategies: incorporate, highlight, or group. The work advances interpretability and reliability of network community detection, with open-source code available at the provided repository.
Abstract
Complex data in social and natural sciences find effective representation through networks, wherein quantitative and categorical information can be associated with nodes and connecting edges. The internal structure of networks can be explored using unsupervised machine learning methods known as community detection algorithms. The process of community detection is inherently subject to uncertainty as algorithms utilize heuristic approaches and randomised procedures to explore vast solution spaces, resulting in non-deterministic outcomes and variability in detected communities across multiple runs. Moreover, many algorithms are not designed to identify outliers and may fail to take into account that a network is an unordered mathematical entity. The main aim of our work is to address these issues through a consensus-based approach by introducing a new framework called Consensus Community Detection (CCD). Our method can be applied to different community detection algorithms, allowing the quantification of uncertainty for the whole network as well as for each node, and providing three strategies for dealing with outliers: incorporate, highlight, or group. The effectiveness of our approach is evaluated on artificial benchmark networks.
