Table of Contents
Fetching ...

Learning Representations for Clustering via Partial Information Discrimination and Cross-Level Interaction

Hai-Xin Zhang, Dong Huang, Hua-Bao Ling, Guang-Yu Zhang, Wei-jun Sun, Zi-hao Wen

TL;DR

PICI addresses unsupervised image clustering by integrating partial information discrimination with cross-level interaction in a Transformer backbone. It combines masked image modeling with two parallel views, a partial-information self-discriminator, two-level contrastive learning, and a cross-level interaction constraint to align instance- and cluster-level spaces. Empirical results on six real-world datasets show substantial improvements over prior deep clustering methods, validating the effectiveness of partial information cues and cross-level guidance for representation learning. The approach is unsupervised, scalable, and accompanied by open-source code for practical deployment.

Abstract

In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at https://github.com/Regan-Zhang/PICI.

Learning Representations for Clustering via Partial Information Discrimination and Cross-Level Interaction

TL;DR

PICI addresses unsupervised image clustering by integrating partial information discrimination with cross-level interaction in a Transformer backbone. It combines masked image modeling with two parallel views, a partial-information self-discriminator, two-level contrastive learning, and a cross-level interaction constraint to align instance- and cluster-level spaces. Empirical results on six real-world datasets show substantial improvements over prior deep clustering methods, validating the effectiveness of partial information cues and cross-level guidance for representation learning. The approach is unsupervised, scalable, and accompanied by open-source code for practical deployment.

Abstract

In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at https://github.com/Regan-Zhang/PICI.
Paper Structure (23 sections, 15 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 15 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of the proposed PICI framework, which jointly incorporates three learning modules, namely, (a) the PISD module, which enforces the partial information self-discrimination upon the masked images via the Transformer auto-encoder, (b) the PICD module, which takes the class tokens [CLS] as input and achieves the partial information contrastive information discrimination via two levels of contrastive learning, and (c) the CLI module, which enables the mutual interaction between the instance-level and cluster-level subspaces by constraining their cross-level consistency.
  • Figure 2: Some examples of the six image datasets used for evaluation, including four remote sensing datasets yang2010baglong2017accuratezhao2015dirichletxia2017aid, a crop pest dataset xie2018multi, and a medical dataset zhu2021hard.
  • Figure 3: Illustration of the convergence of PICI (w.r.t. NMI, ACC, ARI) on the RSOD and Chaoyang dataset.
  • Figure 4: The t-SNE visualization of PICI on the RSOD dataset.