Table of Contents
Fetching ...

Rethinking Deep Clustering Paradigms: Self-Supervision Is All You Need

Amal Shaheena, Nairouz Mrabahb, Riadh Ksantinia, Abdulla Alqaddoumia

TL;DR

This work identifies three core limitations of existing deep clustering paradigms—Feature Randomness, Feature Drift, and Feature Twist—arising from the interplay between self-supervision and pseudo-supervision. It proposes R-DC, a two-phase self-supervision framework that replaces pseudo-label-based signals with proximity-level self-supervision, supported by a dual-filtering mechanism that constructs nearest-neighbor centroids. The method preserves the geometric structure of latent manifolds while avoiding random feature generation and feature drift, yielding state-of-the-art clustering performance across six datasets and improved robustness to hyperparameters and noise. By removing pseudo-supervision and enabling a smooth transition from instance-level to neighborhood-level cues, R-DC demonstrates a principled, scalable path for deep clustering with strong practical impact in high-dimensional data analysis.

Abstract

The recent advances in deep clustering have been made possible by significant progress in self-supervised and pseudo-supervised learning. However, the trade-off between self-supervision and pseudo-supervision can give rise to three primary issues. The joint training causes Feature Randomness and Feature Drift, whereas the independent training causes Feature Randomness and Feature Twist. In essence, using pseudo-labels generates random and unreliable features. The combination of pseudo-supervision and self-supervision drifts the reliable clustering-oriented features. Moreover, moving from self-supervision to pseudo-supervision can twist the curved latent manifolds. This paper addresses the limitations of existing deep clustering paradigms concerning Feature Randomness, Feature Drift, and Feature Twist. We propose a new paradigm with a new strategy that replaces pseudo-supervision with a second round of self-supervision training. The new strategy makes the transition between instance-level self-supervision and neighborhood-level self-supervision smoother and less abrupt. Moreover, it prevents the drifting effect that is caused by the strong competition between instance-level self-supervision and clustering-level pseudo-supervision. Moreover, the absence of the pseudo-supervision prevents the risk of generating random features. With this novel approach, our paper introduces a Rethinking of the Deep Clustering Paradigms, denoted by R-DC. Our model is specifically designed to address three primary challenges encountered in Deep Clustering: Feature Randomness, Feature Drift, and Feature Twist. Experimental results conducted on six datasets have shown that the two-level self-supervision training yields substantial improvements.

Rethinking Deep Clustering Paradigms: Self-Supervision Is All You Need

TL;DR

This work identifies three core limitations of existing deep clustering paradigms—Feature Randomness, Feature Drift, and Feature Twist—arising from the interplay between self-supervision and pseudo-supervision. It proposes R-DC, a two-phase self-supervision framework that replaces pseudo-label-based signals with proximity-level self-supervision, supported by a dual-filtering mechanism that constructs nearest-neighbor centroids. The method preserves the geometric structure of latent manifolds while avoiding random feature generation and feature drift, yielding state-of-the-art clustering performance across six datasets and improved robustness to hyperparameters and noise. By removing pseudo-supervision and enabling a smooth transition from instance-level to neighborhood-level cues, R-DC demonstrates a principled, scalable path for deep clustering with strong practical impact in high-dimensional data analysis.

Abstract

The recent advances in deep clustering have been made possible by significant progress in self-supervised and pseudo-supervised learning. However, the trade-off between self-supervision and pseudo-supervision can give rise to three primary issues. The joint training causes Feature Randomness and Feature Drift, whereas the independent training causes Feature Randomness and Feature Twist. In essence, using pseudo-labels generates random and unreliable features. The combination of pseudo-supervision and self-supervision drifts the reliable clustering-oriented features. Moreover, moving from self-supervision to pseudo-supervision can twist the curved latent manifolds. This paper addresses the limitations of existing deep clustering paradigms concerning Feature Randomness, Feature Drift, and Feature Twist. We propose a new paradigm with a new strategy that replaces pseudo-supervision with a second round of self-supervision training. The new strategy makes the transition between instance-level self-supervision and neighborhood-level self-supervision smoother and less abrupt. Moreover, it prevents the drifting effect that is caused by the strong competition between instance-level self-supervision and clustering-level pseudo-supervision. Moreover, the absence of the pseudo-supervision prevents the risk of generating random features. With this novel approach, our paper introduces a Rethinking of the Deep Clustering Paradigms, denoted by R-DC. Our model is specifically designed to address three primary challenges encountered in Deep Clustering: Feature Randomness, Feature Drift, and Feature Twist. Experimental results conducted on six datasets have shown that the two-level self-supervision training yields substantial improvements.

Paper Structure

This paper contains 30 sections, 17 equations, 20 figures, 8 tables, 1 algorithm.

Figures (20)

  • Figure 1: First evidence of Feature Twist. Average ID and LID of DEC on MNIST and FMNIST based on two pretraining strategies: vanilla reconstruction and instance-level contrastive learning. Average ID: average ID of the clustering manifolds. LID: number of dimensions that can capture $90\%$ of the covariance matrix (linear correlations) estimated based on PCA (Principal Component Analysis).
  • Figure 2: Second evidence of Feature Twist. Collapse of the latent structures in the clustering phase.
  • Figure 3: The Comparison in terms of Accuracy between Phase 1 (i.e., pretraining) and Phase 2 (i.e., finetuning) for DynAE, IDEC, and DEC using five Datasets.
  • Figure 4: The first deep clustering paradigm. This paradigm does not require self-supervised training. It only uses pseudo-supervision to train a deep neural network.
  • Figure 5: The second deep clustering paradigm. This paradigm involves two phases: pretraining and fine-tuning. Initially, a deep neural network is trained based on self-supervised learning. Then, the network is finetuned based on pseudo-supervision.
  • ...and 15 more figures