Table of Contents
Fetching ...

ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery

Kailin Lyu, Jianwei He, Long Xiao, Jianing Zeng, Liang Fan, Lin Shu, Jie Hao

TL;DR

Addresses generalization and forgetting in open-world Generalized Category Discovery by pinpointing shortcut learning as a key bottleneck. Introduces ClearGCD, a plug-and-play framework with Semantic View Alignment (SVA) and Shortcut Suppression Regularization (SSR) to suppress non-semantic cues and stabilize representations. Uses a parametric clustering setup with cross-augmentation consistency and adaptive prototype alignment to improve both known and novel category recognition. Demonstrates consistent improvements over state-of-the-art baselines across CIFAR, ImageNet-100, and fine-grained datasets, validating practical impact for open-world vision tasks.

Abstract

In open-world scenarios, Generalized Category Discovery (GCD) requires identifying both known and novel categories within unlabeled data. However, existing methods often suffer from prototype confusion caused by shortcut learning, which undermines generalization and leads to forgetting of known classes. We propose ClearGCD, a framework designed to mitigate reliance on non-semantic cues through two complementary mechanisms. First, Semantic View Alignment (SVA) generates strong augmentations via cross-class patch replacement and enforces semantic consistency using weak augmentations. Second, Shortcut Suppression Regularization (SSR) maintains an adaptive prototype bank that aligns known classes while encouraging separation of potential novel ones. ClearGCD can be seamlessly integrated into parametric GCD approaches and consistently outperforms state-of-the-art methods across multiple benchmarks.

ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery

TL;DR

Addresses generalization and forgetting in open-world Generalized Category Discovery by pinpointing shortcut learning as a key bottleneck. Introduces ClearGCD, a plug-and-play framework with Semantic View Alignment (SVA) and Shortcut Suppression Regularization (SSR) to suppress non-semantic cues and stabilize representations. Uses a parametric clustering setup with cross-augmentation consistency and adaptive prototype alignment to improve both known and novel category recognition. Demonstrates consistent improvements over state-of-the-art baselines across CIFAR, ImageNet-100, and fine-grained datasets, validating practical impact for open-world vision tasks.

Abstract

In open-world scenarios, Generalized Category Discovery (GCD) requires identifying both known and novel categories within unlabeled data. However, existing methods often suffer from prototype confusion caused by shortcut learning, which undermines generalization and leads to forgetting of known classes. We propose ClearGCD, a framework designed to mitigate reliance on non-semantic cues through two complementary mechanisms. First, Semantic View Alignment (SVA) generates strong augmentations via cross-class patch replacement and enforces semantic consistency using weak augmentations. Second, Shortcut Suppression Regularization (SSR) maintains an adaptive prototype bank that aligns known classes while encouraging separation of potential novel ones. ClearGCD can be seamlessly integrated into parametric GCD approaches and consistently outperforms state-of-the-art methods across multiple benchmarks.

Paper Structure

This paper contains 10 sections, 11 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The visual explanations by GradCAM++ on the training set of CIFAR-10. Although all methods predict the correct class, “shortcut learning” persists in DCCL dccl and SimGCD simgcd.
  • Figure 2: The overview of our proposed framework in GCD. By integrating SVA and SSR, we disrupt shortcut learning and enhance generalization performance in generalized category discovery.
  • Figure 3: Visualization of the representations of test data on CIFAR10 using t-SNE.
  • Figure 4: Ablation study on $M$ and $\beta$ was conducted on CUB.