Table of Contents
Fetching ...

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan

TL;DR

ASRC addresses clustering unstructured data without requiring the number of clusters by jointly learning an adaptive graph and clustering-friendly representations through a graph auto-encoder with contrastive learning, and by using robust continuous clustering (RCC) to generate prototypes for negative sampling. The method integrates enhanced adaptive graph structure learning (EadaGAE), clustering-guided self-supervised learning, and RCC-based clustering into a unified unsupervised objective $\mathcal{L}_{ASRC}=\mathcal{L}_{GAE}+\beta\mathcal{L}_{ssl}$, iteratively updating the graph, weights, and representations. Empirical results on seven benchmarks show that ASRC surpasses baselines requiring prior cluster numbers and demonstrates robustness to graph noise, with ablations confirming the value of adaptive graph learning and debiased negative sampling. The approach offers a practical, label-free solution for clustering diverse unstructured data in real-world settings.

Abstract

We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friendly feature representations by an enhanced graph auto-encoder with contrastive learning technique. It further leverages the clustering results adaptively obtained by robust continuous clustering (RCC) to generate prototypes for negative sampling, which can further contribute to promoting consistency among positive pairs and enlarging the gap between positive and negative samples. ASRC obtains the final clustering results by applying RCC to the learned feature representations with their consistent graph structure and edge weights. Extensive experiments conducted on seven benchmark datasets demonstrate the efficacy of ASRC, demonstrating its superior performance over other popular clustering models. Notably, ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

TL;DR

ASRC addresses clustering unstructured data without requiring the number of clusters by jointly learning an adaptive graph and clustering-friendly representations through a graph auto-encoder with contrastive learning, and by using robust continuous clustering (RCC) to generate prototypes for negative sampling. The method integrates enhanced adaptive graph structure learning (EadaGAE), clustering-guided self-supervised learning, and RCC-based clustering into a unified unsupervised objective , iteratively updating the graph, weights, and representations. Empirical results on seven benchmarks show that ASRC surpasses baselines requiring prior cluster numbers and demonstrates robustness to graph noise, with ablations confirming the value of adaptive graph learning and debiased negative sampling. The approach offers a practical, label-free solution for clustering diverse unstructured data in real-world settings.

Abstract

We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friendly feature representations by an enhanced graph auto-encoder with contrastive learning technique. It further leverages the clustering results adaptively obtained by robust continuous clustering (RCC) to generate prototypes for negative sampling, which can further contribute to promoting consistency among positive pairs and enlarging the gap between positive and negative samples. ASRC obtains the final clustering results by applying RCC to the learned feature representations with their consistent graph structure and edge weights. Extensive experiments conducted on seven benchmark datasets demonstrate the efficacy of ASRC, demonstrating its superior performance over other popular clustering models. Notably, ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.
Paper Structure (27 sections, 18 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 27 sections, 18 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: The framework of ASRC. Firstly, we augment features to obtain multiple views of data and address problem \ref{['pij']} to generate the weighted graph. Then, we apply GAE to obtain embeddings. The graph structure is updated adaptively by increasing the sparsity parameter $k$. After updating the new graph structure, we incorporate contrastive learning to retrain GAE. Once the training process converges, we utilize the learned graph structure, edge weights, and representations as inputs for robust continuous clustering, ultimately obtaining clustering results. Then we use the clustering results to guide the selection of negative samples and update the clustering results until convergence.
  • Figure 2: Parameter sensitivity of ASRC on UMIST. The results are presented in percentage format.
  • Figure : (a) Raw features
  • Figure : (a) Raw features
  • Figure : (b) SDCN
  • ...and 1 more figures