CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery
Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee
TL;DR
AD-GCD addresses clustering unlabeled target data that originate from a different distribution than the labeled source data. The authors propose CDAD-Net, which combines entropy-driven cross-domain alignment using target-to-source prototype distances, a neighborhood-centric contrastive learning objective for the target domain, and a conditional image inpainting loss to capture fine-grained semantic structure. The architecture uses a ViT backbone, a domain discriminator, and a patch-based inpainting decoder, with a two-stage training schedule and semi-supervised K-means inference. Experiments on Office-Home, DomainNet, and PACS show strong improvements over prior methods, demonstrating effective cross-domain knowledge transfer and robust novel-class clustering. The approach demonstrates strong cross-domain transfer and robust clustering of novel classes, highlighting practical applicability to real-world cross-domain recognition tasks.
Abstract
In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is architected to synchronize potential known class samples across both the labeled (source) and unlabeled (target) datasets, while emphasizing the distinct categorization of the target data. To facilitate this, we propose an entropy-driven adversarial learning strategy that accounts for the distance distributions of target samples relative to source-domain class prototypes. Parallelly, the discriminative nature of the shared space is upheld through a fusion of three metric learning objectives. In the source domain, our focus is on refining the proximity between samples and their affiliated class prototypes, while in the target domain, we integrate a neighborhood-centric contrastive learning mechanism, enriched with an adept neighborsmining approach. To further accentuate the nuanced feature interrelation among semantically aligned images, we champion the concept of conditional image inpainting, underscoring the premise that semantically analogous images prove more efficacious to the task than their disjointed counterparts. Experimentally, CDAD-NET eclipses existing literature with a performance increment of 8-15% on three AD-GCD benchmarks we present.
