Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number
Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan
TL;DR
ASRC addresses clustering unstructured data without requiring the number of clusters by jointly learning an adaptive graph and clustering-friendly representations through a graph auto-encoder with contrastive learning, and by using robust continuous clustering (RCC) to generate prototypes for negative sampling. The method integrates enhanced adaptive graph structure learning (EadaGAE), clustering-guided self-supervised learning, and RCC-based clustering into a unified unsupervised objective $\mathcal{L}_{ASRC}=\mathcal{L}_{GAE}+\beta\mathcal{L}_{ssl}$, iteratively updating the graph, weights, and representations. Empirical results on seven benchmarks show that ASRC surpasses baselines requiring prior cluster numbers and demonstrates robustness to graph noise, with ablations confirming the value of adaptive graph learning and debiased negative sampling. The approach offers a practical, label-free solution for clustering diverse unstructured data in real-world settings.
Abstract
We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friendly feature representations by an enhanced graph auto-encoder with contrastive learning technique. It further leverages the clustering results adaptively obtained by robust continuous clustering (RCC) to generate prototypes for negative sampling, which can further contribute to promoting consistency among positive pairs and enlarging the gap between positive and negative samples. ASRC obtains the final clustering results by applying RCC to the learned feature representations with their consistent graph structure and edge weights. Extensive experiments conducted on seven benchmark datasets demonstrate the efficacy of ASRC, demonstrating its superior performance over other popular clustering models. Notably, ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.
