Table of Contents
Fetching ...

CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery

Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee

TL;DR

AD-GCD addresses clustering unlabeled target data that originate from a different distribution than the labeled source data. The authors propose CDAD-Net, which combines entropy-driven cross-domain alignment using target-to-source prototype distances, a neighborhood-centric contrastive learning objective for the target domain, and a conditional image inpainting loss to capture fine-grained semantic structure. The architecture uses a ViT backbone, a domain discriminator, and a patch-based inpainting decoder, with a two-stage training schedule and semi-supervised K-means inference. Experiments on Office-Home, DomainNet, and PACS show strong improvements over prior methods, demonstrating effective cross-domain knowledge transfer and robust novel-class clustering. The approach demonstrates strong cross-domain transfer and robust clustering of novel classes, highlighting practical applicability to real-world cross-domain recognition tasks.

Abstract

In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is architected to synchronize potential known class samples across both the labeled (source) and unlabeled (target) datasets, while emphasizing the distinct categorization of the target data. To facilitate this, we propose an entropy-driven adversarial learning strategy that accounts for the distance distributions of target samples relative to source-domain class prototypes. Parallelly, the discriminative nature of the shared space is upheld through a fusion of three metric learning objectives. In the source domain, our focus is on refining the proximity between samples and their affiliated class prototypes, while in the target domain, we integrate a neighborhood-centric contrastive learning mechanism, enriched with an adept neighborsmining approach. To further accentuate the nuanced feature interrelation among semantically aligned images, we champion the concept of conditional image inpainting, underscoring the premise that semantically analogous images prove more efficacious to the task than their disjointed counterparts. Experimentally, CDAD-NET eclipses existing literature with a performance increment of 8-15% on three AD-GCD benchmarks we present.

CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery

TL;DR

AD-GCD addresses clustering unlabeled target data that originate from a different distribution than the labeled source data. The authors propose CDAD-Net, which combines entropy-driven cross-domain alignment using target-to-source prototype distances, a neighborhood-centric contrastive learning objective for the target domain, and a conditional image inpainting loss to capture fine-grained semantic structure. The architecture uses a ViT backbone, a domain discriminator, and a patch-based inpainting decoder, with a two-stage training schedule and semi-supervised K-means inference. Experiments on Office-Home, DomainNet, and PACS show strong improvements over prior methods, demonstrating effective cross-domain knowledge transfer and robust novel-class clustering. The approach demonstrates strong cross-domain transfer and robust clustering of novel classes, highlighting practical applicability to real-world cross-domain recognition tasks.

Abstract

In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is architected to synchronize potential known class samples across both the labeled (source) and unlabeled (target) datasets, while emphasizing the distinct categorization of the target data. To facilitate this, we propose an entropy-driven adversarial learning strategy that accounts for the distance distributions of target samples relative to source-domain class prototypes. Parallelly, the discriminative nature of the shared space is upheld through a fusion of three metric learning objectives. In the source domain, our focus is on refining the proximity between samples and their affiliated class prototypes, while in the target domain, we integrate a neighborhood-centric contrastive learning mechanism, enriched with an adept neighborsmining approach. To further accentuate the nuanced feature interrelation among semantically aligned images, we champion the concept of conditional image inpainting, underscoring the premise that semantically analogous images prove more efficacious to the task than their disjointed counterparts. Experimentally, CDAD-NET eclipses existing literature with a performance increment of 8-15% on three AD-GCD benchmarks we present.
Paper Structure (10 sections, 4 equations, 6 figures, 7 tables)

This paper contains 10 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: We introduce the problem setting of across domain generalized category discovery (AD-GCD), which is different from the traditional GCD setting vaze2022generalized in that we consider the labeled and unlabeled data to arise from different data distributions. We call the labeled data the Source domain and the unlabeled data the Target domain, respectively.
  • Figure 2: Architecture overview and training pipeline of CDAD-Net. The DINO pre-trained ViT encoder $\mathcal{F}_e$ is first fine-tuned on $\mathcal{D}_{\mathcal{L}}$. Subsequently, the model is trained on $\mathcal{D}_{\mathcal{L}} \cup \mathcal{D}_{\mathcal{U}}$ in two stages in each training epoch, using the domain alignment objective $\mathcal{L}_{align}$ given $(\mathcal{F}_e, \mathcal{F}_{disc})$, and the cumulative metric objective $\mathcal{L}_{con}^l + \mathcal{L}_{con}^u + \mathcal{L}_{inp}^{ul}$ given $(\mathcal{F}_e, \mathcal{F}_d)$. Inference is carried our by semi-supervised K-means applied on $\mathcal{F}_e(\mathcal{D}_{\mathcal{L}}) \cup \mathcal{F}_e(\mathcal{D}_{\mathcal{U}})$. The number of target domain clusters is estimated using the elbow method.
  • Figure 3: t-SNE visualizations of the target domain clusters, as produced by pre-trained ViT, SimGCD simgcd, SimGCD with OSDA saito2018open, and CDAD-Net for the PACS dataset, clearly demonstrate that CDAD-Net excels in generating the most distinctive embedding space.
  • Figure 4: Openness analysis of CDAD-Net and competitors on Office-Home and Domain-Net datasets.
  • Figure 5: The attention maps produced using GRADCAM.
  • ...and 1 more figures