Deep Clustering via Gradual Community Detection
Tianyu Cheng, Qun Chen
TL;DR
DCvCD tackles weak supervision in deep clustering by introducing gradual community detection, over-clustering data into pseudo-communities and iteratively merging them to improve pseudo-label purity. The method adds a cluster-network analysis perspective, using Leiden-based community detection and a merging score that combines modularity, local connectivity, and proximity, paired with an adapted InfoNCE loss for backbone fine-tuning. Empirical results across five image benchmarks show state-of-the-art performance and robustness to ablations, underscoring the value of global cluster structure for self-supervision. The work suggests practical impact in leveraging existing backbones with minimal changes and points to future multimodal extensions and tighter backbone–community integration.
Abstract
Deep clustering is an essential task in modern artificial intelligence, aiming to partition a set of data samples into a given number of homogeneous groups (i.e., clusters). Recent studies have proposed increasingly advanced deep neural networks and training strategies for deep clustering, effectively improving performance. However, deep clustering generally remains challenging due to the inadequacy of supervision signals. Building upon the existing representation learning backbones, this paper proposes a novel clustering strategy of gradual community detection. It initializes clustering by partitioning samples into many pseudo-communities and then gradually expands clusters by community merging. Compared with the existing clustering strategies, community detection factors in the new perspective of cluster network analysis in the clustering process. The new perspective can effectively leverage global structural characteristics to enhance cluster pseudo-label purity, which is critical to the performance of self-supervision. We have implemented the proposed approach based on the popular backbones and evaluated its efficacy on benchmark image datasets. Our extensive experiments have shown that the proposed clustering strategy can effectively improve the SOTA performance. Our ablation study also demonstrates that the new network perspective can effectively improve community pseudo-label purity, resulting in improved self-supervision.
