You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering
Hanyang Li, Yuheng Jia, Hui Liu, Junhui Hou
TL;DR
DCBoost addresses the gap where deep clustering methods often exhibit weak global structure despite strong local neighborhoods. It is a parameter-free plug-in that uses adaptive $k$-NN consistency within mini-batches to select high-confidence samples as anchors, then optimizes a pseudo-label augmented discriminative loss $L=L_{pos}+L_{neg}+L_{ins}$ to tighten intra-class clustering and enhance inter-class separation. Across five benchmarks and six baselines, DCBoost delivers consistent gains, markedly improving silhouette scores and ACC while maintaining efficiency, and even extending benefits to CLIP-based models. By strategically leveraging reliable local cues to refine the global feature space, DCBoost offers a practical, scalable boost for deep clustering with broad applicability.
Abstract
Recent deep clustering models have produced impressive clustering performance. However, a common issue with existing methods is the disparity between global and local feature structures. While local structures typically show strong consistency and compactness within class samples, global features often present intertwined boundaries and poorly separated clusters. Motivated by this observation, we propose DCBoost, a parameter-free plug-in designed to enhance the global feature structures of current deep clustering models. By harnessing reliable local structural cues, our method aims to elevate clustering performance effectively. Specifically, we first identify high-confidence samples through adaptive $k$-nearest neighbors-based consistency filtering, aiming to select a sufficient number of samples with high label reliability to serve as trustworthy anchors for self-supervision. Subsequently, these samples are utilized to compute a discriminative loss, which promotes both intra-class compactness and inter-class separability, to guide network optimization. Extensive experiments across various benchmark datasets showcase that our DCBoost significantly improves the clustering performance of diverse existing deep clustering models. Notably, our method improves the performance of current state-of-the-art baselines (e.g., ProPos) by more than 3% and amplifies the silhouette coefficient by over $7\times$. Code is available at <https://github.com/l-h-y168/DCBoost>.
