Table of Contents
Fetching ...

When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach

Vaibhav Rathore, Shubhranil B, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee

TL;DR

DG-GCD addresses clustering of known and novel classes under domain shifts while withholding target-domain data during training. It introduces DG2CD-Net, which leverages episodic training with source and synthetic domains and an adaptive task-vector aggregation to build a domain-independent embedding space suitable for generalized category discovery. A margin-based open-set domain-adaptation objective, coupled with supervised and unsupervised contrastive losses, helps separate known and novel classes, while a validation-driven weighting scheme selects the most generalizable episode models for updating the global model. Synthetic domains generated by Instruct-Pix2Pix, prompted via ChatGPT, diversify training and improve generalization without leaking target-domain information. Empirical results on PACS, Office-Home, and DomainNet demonstrate strong performance gains over existing DG-GCD baselines and ablation analyses highlight the importance of synthetic data, episodic updates, and the adaptive task-vector mechanism for robust domain generalization and fine-grained novel category discovery.

Abstract

Generalized Class Discovery (GCD) clusters base and novel classes in a target domain using supervision from a source domain with only base classes. Current methods often falter with distribution shifts and typically require access to target data during training, which can sometimes be impractical. To address this issue, we introduce the novel paradigm of Domain Generalization in GCD (DG-GCD), where only source data is available for training, while the target domain, with a distinct data distribution, remains unseen until inference. To this end, our solution, DG2CD-Net, aims to construct a domain-independent, discriminative embedding space for GCD. The core innovation is an episodic training strategy that enhances cross-domain generalization by adapting a base model on tasks derived from source and synthetic domains generated by a foundation model. Each episode focuses on a cross-domain GCD task, diversifying task setups over episodes and combining open-set domain adaptation with a novel margin loss and representation learning for optimizing the feature space progressively. To capture the effects of fine-tuning on the base model, we extend task arithmetic by adaptively weighting the local task vectors concerning the fine-tuned models based on their GCD performance on a validation distribution. This episodic update mechanism boosts the adaptability of the base model to unseen targets. Experiments across three datasets confirm that DG2CD-Net outperforms existing GCD methods customized for DG-GCD.

When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach

TL;DR

DG-GCD addresses clustering of known and novel classes under domain shifts while withholding target-domain data during training. It introduces DG2CD-Net, which leverages episodic training with source and synthetic domains and an adaptive task-vector aggregation to build a domain-independent embedding space suitable for generalized category discovery. A margin-based open-set domain-adaptation objective, coupled with supervised and unsupervised contrastive losses, helps separate known and novel classes, while a validation-driven weighting scheme selects the most generalizable episode models for updating the global model. Synthetic domains generated by Instruct-Pix2Pix, prompted via ChatGPT, diversify training and improve generalization without leaking target-domain information. Empirical results on PACS, Office-Home, and DomainNet demonstrate strong performance gains over existing DG-GCD baselines and ablation analyses highlight the importance of synthetic data, episodic updates, and the adaptive task-vector mechanism for robust domain generalization and fine-grained novel category discovery.

Abstract

Generalized Class Discovery (GCD) clusters base and novel classes in a target domain using supervision from a source domain with only base classes. Current methods often falter with distribution shifts and typically require access to target data during training, which can sometimes be impractical. To address this issue, we introduce the novel paradigm of Domain Generalization in GCD (DG-GCD), where only source data is available for training, while the target domain, with a distinct data distribution, remains unseen until inference. To this end, our solution, DG2CD-Net, aims to construct a domain-independent, discriminative embedding space for GCD. The core innovation is an episodic training strategy that enhances cross-domain generalization by adapting a base model on tasks derived from source and synthetic domains generated by a foundation model. Each episode focuses on a cross-domain GCD task, diversifying task setups over episodes and combining open-set domain adaptation with a novel margin loss and representation learning for optimizing the feature space progressively. To capture the effects of fine-tuning on the base model, we extend task arithmetic by adaptively weighting the local task vectors concerning the fine-tuned models based on their GCD performance on a validation distribution. This episodic update mechanism boosts the adaptability of the base model to unseen targets. Experiments across three datasets confirm that DG2CD-Net outperforms existing GCD methods customized for DG-GCD.

Paper Structure

This paper contains 33 sections, 9 equations, 13 figures, 18 tables, 1 algorithm.

Figures (13)

  • Figure 1: We present a novel variant of GCD, Domain Generalization for Generalized Category Discovery, where a model is trained on a source domain (photo) and evaluated on a target domain (cartoon). During inference, the model must cluster seen classes while also identifying and distinguishing novel classes.
  • Figure 2: Proposed episodic training: A pre-trained global model is updated using task vectors from $n_e$ episode-specific fine-tuned models, leveraging a novel dynamic weighting scheme. This scheme adjusts the task vectors based on their GCD generalization performance on a held-out unseen validation distribution.
  • Figure 3: Sign Conflicts—opposing gradient updates leading to greater parameter divergence across episodes in various model merging methods for our episodic training on PACS. Lower values indicate improved training stability and accuracy.
  • Figure 4: The FID scoreseiter1994computing between the Real-World and the three other Office-Home officehome domains, along with the average FID between Real-World and the InstructPix2Pix-driven and manually-crafted synthesized domains, confirm that the InstructPix2Pix-driven synthetic domains introduce diverse distribution shifts into our episodic training process, leading to enhanced generalizability of $\theta_{\text{global}}$ through the proposed training scheme.
  • Figure 5: Transition from $\theta_{\text{global}}^{g-1}$ to $\theta_{\text{global}}^{g}$ in our training strategy: The left panel illustrates the two-way episodic training process. Starting with episode-specific datasets $(\mathcal{D}_{\mathcal{S}}^{1_g}, \mathcal{D}^{1_g}_{\text{syn}})$ and $(\mathcal{D}_{\mathcal{S}}^{2_g}, \mathcal{D}^{2_g}_{\text{syn}})$, we fine-tune the previous global model $\mathcal{F}^{g-1}$ together with episode-specific adversarial classifiers $\mathcal{F}_c^{1_g}$ and $\mathcal{F}_c^{2_g}$ on the local CD-GCD tasks. This produces fine-tuned models with updated weights $\theta_{\text{local}}^{1_g}$ and $\theta_{\text{local}}^{2_g}$. We calculate the task vectors ($\delta_g^1, \delta_g^2$) for the fine-tuned models. GCD generalization is subsequently assessed on $\mathcal{D}_{\text{valid}}$ using the All metric, resulting in generalization weights $(w_g^1, w_g^2)$ for the fine-tuned models. The right panel shows how the global models are updated through task vector aggregations for baseline TA task_arithmetic and ours. Red and Blue denote the episodes-specific data/processing.
  • ...and 8 more figures