A Survey on Deep Clustering: From the Prior Perspective
Yiding Lu, Haobin Li, Yunfan Li, Yijie Lin, Xi Peng
TL;DR
The surveyed work reframes deep clustering through a priors lens, arguing that progress hinges on how prior knowledge—six identified categories—guides feature learning and cluster assignment in the absence of labels. It traces a trajectory from structure- and distribution-based priors to augmentation-invariance, neighborhood consistency, pseudo-labeling, and external knowledge, with a demonstrated performance ladder across five benchmark datasets. The authors provide a benchmark-driven evaluation, highlight state-of-the-art gains from successive priors, and discuss practical applications and future challenges, including fine-grained, non-parametric, fair, and multi-view clustering. This perspective offers a concise, applicability-focused view that could steer future deep clustering research, particularly toward external knowledge and cross-modal signals. The work also notes potential synergies with external pre-trained models and language-vision frameworks to further enhance clustering capabilities, with broad implications for unsupervised learning in complex real-world data.
Abstract
Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation and utilization of prior knowledge, which is largely ignored by existing works. From pioneering deep clustering methods based on data structure assumptions to recent contrastive clustering methods based on data augmentation invariances, the development of deep clustering intrinsically corresponds to the evolution of prior knowledge. In this survey, we provide a comprehensive review of deep clustering methods by categorizing them into six types of prior knowledge. We find that in general the prior innovation follows two trends, namely, i) from mining to constructing, and ii) from internal to external. Besides, we provide a benchmark on five widely-used datasets and analyze the performance of methods with diverse priors. By providing a novel prior knowledge perspective, we hope this survey could provide some novel insights and inspire future research in the deep clustering community.
