Table of Contents
Fetching ...

Revisiting Mutual Information Maximization for Generalized Category Discovery

Zhaorui Tan, Chengrui Zhang, Xi Yang, Jie Sun, Kaizhu Huang

TL;DR

This work tackles generalized category discovery (GCD), where unlabeled data contain both known and unknown classes. It introduces Regularized Parametric InfoMax (RPIM), an InfoMax-based framework that enforces an independence constraint between known and unknown predictions while assuming a uniform class prior on unconfident outputs, and leverages a semantic-bias transformation to refine latent features without costly fine-tuning. RPIM uses Sinkhorn to generate soft pseudo-labels, selects confident pseudo-labels via a threshold, and adds a regularization term $L_R$ to promote reliable pseudo-labels and strengthen class separation; it also includes a regularization $L_S$ to align pseudo-labels with labeled anchors. Theoretical analysis and experiments on six datasets show RPIM achieves state-of-the-art performance, notably improving unknown-class accuracy by an average of about $3.5\%$ and delivering substantial gains on several benchmark datasets, while maintaining efficiency through latent-feature refinement rather than full fine-tuning. This approach enables more scalable open-world recognition by robustly discovering unknown categories with improved separation from known classes.

Abstract

Generalized category discovery presents a challenge in a realistic scenario, which requires the model's generalization ability to recognize unlabeled samples from known and unknown categories. This paper revisits the challenge of generalized category discovery through the lens of information maximization (InfoMax) with a probabilistic parametric classifier. Our findings reveal that ensuring independence between known and unknown classes while concurrently assuming a uniform probability distribution across all classes, yields an enlarged margin among known and unknown classes that promotes the model's performance. To achieve the aforementioned independence, we propose a novel InfoMax-based method, Regularized Parametric InfoMax (RPIM), which adopts pseudo labels to supervise unlabeled samples during InfoMax, while proposing a regularization to ensure the quality of the pseudo labels. Additionally, we introduce novel semantic-bias transformation to refine the features from the pre-trained model instead of direct fine-tuning to rescue the computational costs. Extensive experiments on six benchmark datasets validate the effectiveness of our method. RPIM significantly improves the performance regarding unknown classes, surpassing the state-of-the-art method by an average margin of 3.5%.

Revisiting Mutual Information Maximization for Generalized Category Discovery

TL;DR

This work tackles generalized category discovery (GCD), where unlabeled data contain both known and unknown classes. It introduces Regularized Parametric InfoMax (RPIM), an InfoMax-based framework that enforces an independence constraint between known and unknown predictions while assuming a uniform class prior on unconfident outputs, and leverages a semantic-bias transformation to refine latent features without costly fine-tuning. RPIM uses Sinkhorn to generate soft pseudo-labels, selects confident pseudo-labels via a threshold, and adds a regularization term to promote reliable pseudo-labels and strengthen class separation; it also includes a regularization to align pseudo-labels with labeled anchors. Theoretical analysis and experiments on six datasets show RPIM achieves state-of-the-art performance, notably improving unknown-class accuracy by an average of about and delivering substantial gains on several benchmark datasets, while maintaining efficiency through latent-feature refinement rather than full fine-tuning. This approach enables more scalable open-world recognition by robustly discovering unknown categories with improved separation from known classes.

Abstract

Generalized category discovery presents a challenge in a realistic scenario, which requires the model's generalization ability to recognize unlabeled samples from known and unknown categories. This paper revisits the challenge of generalized category discovery through the lens of information maximization (InfoMax) with a probabilistic parametric classifier. Our findings reveal that ensuring independence between known and unknown classes while concurrently assuming a uniform probability distribution across all classes, yields an enlarged margin among known and unknown classes that promotes the model's performance. To achieve the aforementioned independence, we propose a novel InfoMax-based method, Regularized Parametric InfoMax (RPIM), which adopts pseudo labels to supervise unlabeled samples during InfoMax, while proposing a regularization to ensure the quality of the pseudo labels. Additionally, we introduce novel semantic-bias transformation to refine the features from the pre-trained model instead of direct fine-tuning to rescue the computational costs. Extensive experiments on six benchmark datasets validate the effectiveness of our method. RPIM significantly improves the performance regarding unknown classes, surpassing the state-of-the-art method by an average margin of 3.5%.
Paper Structure (15 sections, 2 theorems, 24 equations, 5 figures, 8 tables)

This paper contains 15 sections, 2 theorems, 24 equations, 5 figures, 8 tables.

Key Result

proposition thmcounterproposition

Given that $\hat{Y}$ and $\hat{Y}^U_{T}$ are reliable and confident, $R(\hat{Y}^U)$ can be considered constant and therefore omitted. Under this assumption, maximizing eq:our_H_final_conn leads to a lower supremum of the risk than maximizing eq:old_H.

Figures (5)

  • Figure 1: (a). Diagram of problem settings. (b) Visualization of the confusion issue on CIFAR100 krizhevsky2009learning. Left: T-SNE map of latent features $Z$ from those classes. Right: Confusion matrix of unlabeled set between known and unknown classes. Solely satisfying the uniform assumption for unconfident predictions causes confusion issues, while our proposed RPIM effectively mitigates.
  • Figure 2: Averaged results across all datasets of k-means macqueen1967classification, RankStats+ han2021autonovel, UNO+ fini2021unified, ORCA cao2022openworld, GCD vaze2022generalized, RIM krause2010discriminative, TIM boudiaf2020information, PIM chiaroni2023parametric, and our proposed RPIM of all classes, known classes, and unknown classes.
  • Figure 3: Averaged results across all datasets of ablation studies and our proposed PRIM.
  • Figure 4: Density histogram of latent features $Z$ of unlabeled samples from the known and unknown classes on CIFAR100.
  • Figure 5: T-SNE map of unlabeled data latent features $Z^L$ from models that use different transformations trained on CIFRA10 dataset. Different colors represent different classes. It can be seen that our proposed semantic-bias transformation leads to the best results.

Theorems & Definitions (5)

  • proposition thmcounterproposition
  • proof
  • proposition thmcounterproposition
  • proof
  • proof