Composing Novel Classes: A Concept-Driven Approach to Generalized Category Discovery
Chuyu Zhang, Peiyan Gu, Xueyang Yu, Xuming He
TL;DR
This work tackles generalized category discovery by introducing ConceptGCD, a three-stage, concept-driven framework that first learns known-class concepts with a covariance-promoting loss, then generates derivable concepts from these concepts via a generator layer, and finally learns underivable concepts with an expansion layer, a contrastive objective, and a concept score normalization to balance concept influence. The approach explicitly separates derivable and underivable concept learning to reduce noise from unlabeled data and to better leverage known-class knowledge in a richer representation space. Empirical results across six benchmarks with both ViT-based backbones show substantial gains over state-of-the-art methods, including notable improvements on novel class clustering and mixed known/novel settings, and maintain robustness when the number of novel clusters is unknown. The proposed covariance-driven concept diversification, explicit concept transfer, and normalization mechanism offer a practical baseline for future GCD research and open-world recognition, highlighting the value of concept-level transfer over joint encoder sharing.
Abstract
We tackle the generalized category discovery (GCD) problem, which aims to discover novel classes in unlabeled datasets by leveraging the knowledge of known classes. Previous works utilize the known class knowledge through shared representation spaces. Despite their progress, our analysis experiments show that novel classes can achieve impressive clustering results on the feature space of a known class pre-trained model, suggesting that existing methods may not fully utilize known class knowledge. To address it, we introduce a novel concept learning framework for GCD, named ConceptGCD, that categorizes concepts into two types: derivable and underivable from known class concepts, and adopts a stage-wise learning strategy to learn them separately. Specifically, our framework first extracts known class concepts by a known class pre-trained model and then produces derivable concepts from them by a generator layer with a covariance-augmented loss. Subsequently, we expand the generator layer to learn underivable concepts in a balanced manner ensured by a concept score normalization strategy and integrate a contrastive loss to preserve previously learned concepts. Extensive experiments on various benchmark datasets demonstrate the superiority of our approach over the previous state-of-the-art methods. Code will be available soon.
