Table of Contents
Fetching ...

Image Clustering Algorithm Based on Self-Supervised Pretrained Models and Latent Feature Distribution Optimization

Qiuyu Zhu, Liheng Hu, Sijin Wang

TL;DR

The paper addresses the poor clustering accuracy of deep clustering on complex natural images by leveraging self-supervised pretrained models to obtain more discriminative latent features and by optimizing latent feature distributions with PEDCC-guided centroids. It introduces the ICBPL framework, an encoder-only architecture trained with four complementary losses, including MMD to PEDCC, augmentation-consistency, k-nearest neighbor, and minimum cosine-distance to PEDCC centers, with periodic KNN updates. Empirical results show state-of-the-art clustering performance across CIFAR-10, STL-10, CIFAR-100, and ImageNet-50, with encoder-only training offering computational advantages and achieving near-supervised performance on some datasets. The work highlights the benefits of combining self-supervised pretraining, neighborhood information, and centroid-based distribution constraints for robust image clustering, and points to future work on scaling to many classes and finer granularity.

Abstract

In the face of complex natural images, existing deep clustering algorithms fall significantly short in terms of clustering accuracy when compared to supervised classification methods, making them less practical. This paper introduces an image clustering algorithm based on self-supervised pretrained models and latent feature distribution optimization, substantially enhancing clustering performance. It is found that: (1) For complex natural images, we effectively enhance the discriminative power of latent features by leveraging self-supervised pretrained models and their fine-tuning, resulting in improved clustering performance. (2) In the latent feature space, by searching for k-nearest neighbor images for each training sample and shortening the distance between the training sample and its nearest neighbor, the discriminative power of latent features can be further enhanced, and clustering performance can be improved. (3) In the latent feature space, reducing the distance between sample features and the nearest predefined cluster centroids can optimize the distribution of latent features, therefore further improving clustering performance. Through experiments on multiple datasets, our approach outperforms the latest clustering algorithms and achieves state-of-the-art clustering results. When the number of categories in the datasets is small, such as CIFAR-10 and STL-10, and there are significant differences between categories, our clustering algorithm has similar accuracy to supervised methods without using pretrained models, slightly lower than supervised methods using pre-trained models. The code linked algorithm is https://github.com/LihengHu/semi.

Image Clustering Algorithm Based on Self-Supervised Pretrained Models and Latent Feature Distribution Optimization

TL;DR

The paper addresses the poor clustering accuracy of deep clustering on complex natural images by leveraging self-supervised pretrained models to obtain more discriminative latent features and by optimizing latent feature distributions with PEDCC-guided centroids. It introduces the ICBPL framework, an encoder-only architecture trained with four complementary losses, including MMD to PEDCC, augmentation-consistency, k-nearest neighbor, and minimum cosine-distance to PEDCC centers, with periodic KNN updates. Empirical results show state-of-the-art clustering performance across CIFAR-10, STL-10, CIFAR-100, and ImageNet-50, with encoder-only training offering computational advantages and achieving near-supervised performance on some datasets. The work highlights the benefits of combining self-supervised pretraining, neighborhood information, and centroid-based distribution constraints for robust image clustering, and points to future work on scaling to many classes and finer granularity.

Abstract

In the face of complex natural images, existing deep clustering algorithms fall significantly short in terms of clustering accuracy when compared to supervised classification methods, making them less practical. This paper introduces an image clustering algorithm based on self-supervised pretrained models and latent feature distribution optimization, substantially enhancing clustering performance. It is found that: (1) For complex natural images, we effectively enhance the discriminative power of latent features by leveraging self-supervised pretrained models and their fine-tuning, resulting in improved clustering performance. (2) In the latent feature space, by searching for k-nearest neighbor images for each training sample and shortening the distance between the training sample and its nearest neighbor, the discriminative power of latent features can be further enhanced, and clustering performance can be improved. (3) In the latent feature space, reducing the distance between sample features and the nearest predefined cluster centroids can optimize the distribution of latent features, therefore further improving clustering performance. Through experiments on multiple datasets, our approach outperforms the latest clustering algorithms and achieves state-of-the-art clustering results. When the number of categories in the datasets is small, such as CIFAR-10 and STL-10, and there are significant differences between categories, our clustering algorithm has similar accuracy to supervised methods without using pretrained models, slightly lower than supervised methods using pre-trained models. The code linked algorithm is https://github.com/LihengHu/semi.
Paper Structure (24 sections, 13 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 24 sections, 13 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Algorithm Flow Diagram: During training, samples are channeled into the clustering network via three separate pathways. The first pathway processes the sample as-is, the second utilizes the augmented version of the sample, and the third handles the nearest-neighbor sample. Four loss functions collectively act upon the latent features outputted by the network, optimizing their representation and distribution. In the end, classification is achieved based on the minimum cosine distance between the latent features and the PEDCC points.The circle in Figure is PEDCC.
  • Figure 2: Visual image of PEDCC pointsICBPC
  • Figure 3: Algorithm diagram , Where $P$ is the pedcc point, $X$ is the original sample, $X_k$ is the nearest neighbor sample, and $X_a$ is the augmented sample.