Table of Contents
Fetching ...

FOCUS: Bridging Fine-Grained Recognition and Open-World Discovery across Domains

Vaibhav Rathore, Divyam Gupta, Moloud Abdar, Subhasis Chaudhuri, Biplab Banerjee

Abstract

We introduce the first unified framework for *Fine-Grained Domain-Generalized Generalized Category Discovery* (FG-DG-GCD), bringing open-world recognition closer to real-world deployment under domain shift. Unlike conventional GCD, which assumes labeled and unlabeled data come from the same distribution, DG-GCD learns only from labeled source data and must both recognize known classes and discover novel ones in unseen, unlabeled target domains. This problem is especially challenging in fine-grained settings, where subtle inter-class differences and large intra-class variation make domain generalization significantly harder. To support systematic evaluation, we establish the first *FG-DG-GCD benchmarks* by creating identity-preserving *painting* and *sketch* domains for CUB-200-2011, Stanford Cars, and FGVC-Aircraft using controlled diffusion-adapter stylization. On top of this ,we propose FoCUS, a single-stage framework that combines *Domain-Consistent Parts Discovery* (DCPD) for geometry-stable part reasoning with *Uncertainty-Aware Feature Augmentation* (UFA) for confidence-calibrated feature regularization through uncertainty-guided perturbations. Extensive experiments show that FoCUS outperforms strong GCD, FG-GCD, and DG-GCD baselines by **3.28%**, **9.68%**, and **2.07%**, respectively, in clustering accuracy on the proposed benchmarks. It also remains competitive on coarse-grained DG-GCD tasks while achieving nearly **3x** higher computational efficiency than the current state of the art. ^[Code and datasets will be released upon acceptance.]

FOCUS: Bridging Fine-Grained Recognition and Open-World Discovery across Domains

Abstract

We introduce the first unified framework for *Fine-Grained Domain-Generalized Generalized Category Discovery* (FG-DG-GCD), bringing open-world recognition closer to real-world deployment under domain shift. Unlike conventional GCD, which assumes labeled and unlabeled data come from the same distribution, DG-GCD learns only from labeled source data and must both recognize known classes and discover novel ones in unseen, unlabeled target domains. This problem is especially challenging in fine-grained settings, where subtle inter-class differences and large intra-class variation make domain generalization significantly harder. To support systematic evaluation, we establish the first *FG-DG-GCD benchmarks* by creating identity-preserving *painting* and *sketch* domains for CUB-200-2011, Stanford Cars, and FGVC-Aircraft using controlled diffusion-adapter stylization. On top of this ,we propose FoCUS, a single-stage framework that combines *Domain-Consistent Parts Discovery* (DCPD) for geometry-stable part reasoning with *Uncertainty-Aware Feature Augmentation* (UFA) for confidence-calibrated feature regularization through uncertainty-guided perturbations. Extensive experiments show that FoCUS outperforms strong GCD, FG-GCD, and DG-GCD baselines by **3.28%**, **9.68%**, and **2.07%**, respectively, in clustering accuracy on the proposed benchmarks. It also remains competitive on coarse-grained DG-GCD tasks while achieving nearly **3x** higher computational efficiency than the current state of the art. ^[Code and datasets will be released upon acceptance.]
Paper Structure (47 sections, 24 equations, 17 figures, 17 tables, 1 algorithm)

This paper contains 47 sections, 24 equations, 17 figures, 17 tables, 1 algorithm.

Figures (17)

  • Figure 1: FG-DG-GCD extends DG-GCD dg2net to the fine-grained regime, where a model trained on labeled source classes must simultaneously recognize known and discover novel categories in unseen, distribution-shifted domains. Compared with coarse-grained DG-GCD dg2netRathore2025HiDISC, this setting is substantially harder due to subtle inter-class cues, high intra-class similarity, and stronger sensitivity to domain-induced appearance changes. FoCUS addresses this challenge through part-consistent representation learning and uncertainty-aware open-space regularization, outperforming existing GCD, FG-GCD, and DG-GCD baselines on the proposed benchmarks (Table \ref{['tab:results']}).
  • Figure 2: Overview of FoCUS. A ViT encoder produces global [CLS] and patch-level features. The Domain-Consistent Parts Discovery (DCPD) module extracts geometry-stable part features $\mathbf{f}_{\text{part}}$, which are fused with the global representation $\mathbf{f}_{\text{cls}}$ and optimized using InfoNCE ($\mathcal{L}_{\text{InfoNCE}}$) and supervised contrastive ($\mathcal{L}_{\text{SCon}}$) losses for fine-grained discrimination. In parallel, the Uncertainty-Aware Feature Augmentation (UFA) module uses EMA-based class statistics $(\boldsymbol{\mu}_y, \boldsymbol{\Sigma}_y)$ to synthesize outlier features $\widetilde{\mathbf{\mathcal{Z}}}$ from class-tail, between-class, and hypersphere regions. A shared classifier $g(\cdot)$ is trained on real features using cross-entropy, and on synthetic outliers using energy-based outlier exposure ($\mathcal{L}_{\text{OE}}$) and entropy maximization ($\mathcal{L}_{\text{ENT}}$). Together, these two pathways yield discriminative, robust, and calibrated embeddings for category discovery under unseen domain shift.
  • Figure 3: Domain-Consistent Attention across Domains. Compared to prior part-localization methods (FOCUS+APL cvprapl), our DCPD module maintains stable attention on the same geometric regions (e.g., bird’s head) across domain shifts from Real$\rightarrow$Painting$\rightarrow$Sketch, demonstrating strong domain-invariant part discovery. See Sup. Mat. for KL divergence between the cross-domain attention maps for quantitative validation.
  • Figure 4: Progressive Attention Refinement within DCPD. Left to Right,the evolution of : (1) Self-Attention Priors provide initial coarse localization; (2) Image-Conditioned Part Queries refine attention to align with instance-specific geometries; (3) Differentiable Patch-to-Part Assignment creates sharp, spatially exclusive part delineations; and finally, (4) Part-Global Fusion integrates these precise structural cues into a unified representation.
  • Figure 5: t-SNE van2008visualizing of DCPD embeddings with source (colored) and UFA outliers (black) occupying low-density boundaries, modeling open-space uncertainty and improving calibration.
  • ...and 12 more figures