Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

Chen-Lu Ding; Jiancan Wu; Wei Lin; Shiyang Shen; Xiang Wang; Yancheng Yuan

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan

TL;DR

ASRC addresses clustering unstructured data without requiring the number of clusters by jointly learning an adaptive graph and clustering-friendly representations through a graph auto-encoder with contrastive learning, and by using robust continuous clustering (RCC) to generate prototypes for negative sampling. The method integrates enhanced adaptive graph structure learning (EadaGAE), clustering-guided self-supervised learning, and RCC-based clustering into a unified unsupervised objective $\mathcal{L}_{ASRC}=\mathcal{L}_{GAE}+\beta\mathcal{L}_{ssl}$, iteratively updating the graph, weights, and representations. Empirical results on seven benchmarks show that ASRC surpasses baselines requiring prior cluster numbers and demonstrates robustness to graph noise, with ablations confirming the value of adaptive graph learning and debiased negative sampling. The approach offers a practical, label-free solution for clustering diverse unstructured data in real-world settings.

Abstract

We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friendly feature representations by an enhanced graph auto-encoder with contrastive learning technique. It further leverages the clustering results adaptively obtained by robust continuous clustering (RCC) to generate prototypes for negative sampling, which can further contribute to promoting consistency among positive pairs and enlarging the gap between positive and negative samples. ASRC obtains the final clustering results by applying RCC to the learned feature representations with their consistent graph structure and edge weights. Extensive experiments conducted on seven benchmark datasets demonstrate the efficacy of ASRC, demonstrating its superior performance over other popular clustering models. Notably, ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

TL;DR

, iteratively updating the graph, weights, and representations. Empirical results on seven benchmarks show that ASRC surpasses baselines requiring prior cluster numbers and demonstrates robustness to graph noise, with ablations confirming the value of adaptive graph learning and debiased negative sampling. The approach offers a practical, label-free solution for clustering diverse unstructured data in real-world settings.

Abstract

Paper Structure (27 sections, 18 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 27 sections, 18 equations, 6 figures, 4 tables, 2 algorithms.

Introduction
Preliminary
Notation
Convex Clustering
Robust Continuous Clustering
Adaptive Self-supervised Robust Clustering
Enhanced Adaptive Graph Structure Learning
Weights Updating
Representation Learning
Self-supervised Learning
Adaptive Self-supervised Robust Clustering
Computational Complexity
Experiment
Experimental Settings
Datasets
...and 12 more sections

Figures (6)

Figure 1: The framework of ASRC. Firstly, we augment features to obtain multiple views of data and address problem \ref{['pij']} to generate the weighted graph. Then, we apply GAE to obtain embeddings. The graph structure is updated adaptively by increasing the sparsity parameter $k$. After updating the new graph structure, we incorporate contrastive learning to retrain GAE. Once the training process converges, we utilize the learned graph structure, edge weights, and representations as inputs for robust continuous clustering, ultimately obtaining clustering results. Then we use the clustering results to guide the selection of negative samples and update the clustering results until convergence.
Figure 2: Parameter sensitivity of ASRC on UMIST. The results are presented in percentage format.
Figure : (a) Raw features
Figure : (a) Raw features
Figure : (b) SDCN
...and 1 more figures

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

TL;DR

Abstract

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

Authors

TL;DR

Abstract

Table of Contents

Figures (6)