Breaking the Reclustering Barrier in Centroid-based Deep Clustering
Lukas Miklautz, Timo Klein, Kevin Sidak, Collin Leiber, Thomas Lang, Andrii Shkabrii, Sebastian Tschiatschek, Claudia Plant
TL;DR
Centroid-based deep clustering often hits a performance ceiling known as the reclustering barrier, where periodic reclustering fails to meaningfully alter the latent space or improve results. BRB addresses this by coupling soft weight resets with reclustering, plus optional momentum resets, to incur structured, persistent perturbations that expand the space of clustering targets while preserving knowledge. Across eight datasets and multiple DC baselines (DEC, IDEC, DCN), BRB yields consistent gains, enables training from scratch, and, when combined with contrastive learning (e.g., SimCLR, SCAN), reaches or surpasses state-of-the-art performance on challenging benchmarks. The approach is lightweight, broadly applicable, and provides new insights into how exploration, knowledge preservation, and target adaptation can overcome early optimization barriers in deep clustering.
Abstract
This work investigates an important phenomenon in centroid-based deep clustering (DC) algorithms: Performance quickly saturates after a period of rapid early gains. Practitioners commonly address early saturation with periodic reclustering, which we demonstrate to be insufficient to address performance plateaus. We call this phenomenon the "reclustering barrier" and empirically show when the reclustering barrier occurs, what its underlying mechanisms are, and how it is possible to Break the Reclustering Barrier with our algorithm BRB. BRB avoids early over-commitment to initial clusterings and enables continuous adaptation to reinitialized clustering targets while remaining conceptually simple. Applying our algorithm to widely-used centroid-based DC algorithms, we show that (1) BRB consistently improves performance across a wide range of clustering benchmarks, (2) BRB enables training from scratch, and (3) BRB performs competitively against state-of-the-art DC algorithms when combined with a contrastive loss. We release our code and pre-trained models at https://github.com/Probabilistic-and-Interactive-ML/breaking-the-reclustering-barrier .
