Table of Contents
Fetching ...

Cross-domain Open-world Discovery

Shuo Wen, Maria Brbic

TL;DR

The paper addresses the problem of discovering unseen classes under domain shift while labeling seen classes, proposing a Cross-domain Open-world Discovery (CD-OWD) setting. It introduces CROW, a cluster-then-match prototype-based method that leverages the structured latent space of foundation models to cluster target data into target prototypes, match them to seen prototypes with a robust many-to-many mapping, and fine-tune the representation via a combined cross-entropy and entropy-regularization objective. Through extensive experiments on Office, OfficeHome, VisDA, and DomainNet across 75 settings, CROW consistently outperforms open-world SSL and universal domain adaptation baselines, achieving about an 8% improvement in the H-score on average and demonstrating robustness to threshold choices and unknown numbers of novel classes. The approach highlights the practical impact of using foundation-model representations for cross-domain open-world discovery and offers a scalable, flexible framework for simultaneous seen-class recognition and novel-class discovery in real-world, domain-shifted environments.

Abstract

In many real-world applications, test data may commonly exhibit categorical shifts, characterized by the emergence of novel classes, as well as distribution shifts arising from feature distributions different from the ones the model was trained on. However, existing methods either discover novel classes in the open-world setting or assume domain shifts without the ability to discover novel classes. In this work, we consider a cross-domain open-world discovery setting, where the goal is to assign samples to seen classes and discover unseen classes under a domain shift. To address this challenging problem, we present CROW, a prototype-based approach that introduces a cluster-then-match strategy enabled by a well-structured representation space of foundation models. In this way, CROW discovers novel classes by robustly matching clusters with previously seen classes, followed by fine-tuning the representation space using an objective designed for cross-domain open-world discovery. Extensive experimental results on image classification benchmark datasets demonstrate that CROW outperforms alternative baselines, achieving an 8% average performance improvement across 75 experimental settings.

Cross-domain Open-world Discovery

TL;DR

The paper addresses the problem of discovering unseen classes under domain shift while labeling seen classes, proposing a Cross-domain Open-world Discovery (CD-OWD) setting. It introduces CROW, a cluster-then-match prototype-based method that leverages the structured latent space of foundation models to cluster target data into target prototypes, match them to seen prototypes with a robust many-to-many mapping, and fine-tune the representation via a combined cross-entropy and entropy-regularization objective. Through extensive experiments on Office, OfficeHome, VisDA, and DomainNet across 75 settings, CROW consistently outperforms open-world SSL and universal domain adaptation baselines, achieving about an 8% improvement in the H-score on average and demonstrating robustness to threshold choices and unknown numbers of novel classes. The approach highlights the practical impact of using foundation-model representations for cross-domain open-world discovery and offers a scalable, flexible framework for simultaneous seen-class recognition and novel-class discovery in real-world, domain-shifted environments.

Abstract

In many real-world applications, test data may commonly exhibit categorical shifts, characterized by the emergence of novel classes, as well as distribution shifts arising from feature distributions different from the ones the model was trained on. However, existing methods either discover novel classes in the open-world setting or assume domain shifts without the ability to discover novel classes. In this work, we consider a cross-domain open-world discovery setting, where the goal is to assign samples to seen classes and discover unseen classes under a domain shift. To address this challenging problem, we present CROW, a prototype-based approach that introduces a cluster-then-match strategy enabled by a well-structured representation space of foundation models. In this way, CROW discovers novel classes by robustly matching clusters with previously seen classes, followed by fine-tuning the representation space using an objective designed for cross-domain open-world discovery. Extensive experimental results on image classification benchmark datasets demonstrate that CROW outperforms alternative baselines, achieving an 8% average performance improvement across 75 experimental settings.
Paper Structure (40 sections, 6 equations, 6 figures, 24 tables)

This paper contains 40 sections, 6 equations, 6 figures, 24 tables.

Figures (6)

  • Figure 1: Illustration of the cross-domain open-world discovery setting. In the cross-domain open-world discovery setting, the goal is to assign samples to previously seen classes and discover new classes under a domain shift. In the example, novel classes like 'fish' and 'turtle', exist in unlabeled data. Additionally, the labeled samples are from the real-world domain, while the unlabeled samples are sketches. In this setting, the goal is to assign each unlabeled sample to either a seen category ('dog', 'cat', 'bird') or to a novel category that is discovered ('novel 1', 'novel 2').
  • Figure 2: Conceptual overview of CROW.(i) CROW extracts features from a foundation model for both source and target samples. Seen prototypes are then obtained using labeled source samples, while target prototypes are obtained by clustering target samples. (ii) CROW matches seen classes to target prototypes using the source samples. Unmatched target prototypes are identified as unseen prototypes. (iii) CROW combines seen prototypes and unseen prototypes. (iv) Finally, CROW fine-tunes the foundation model to update the representation space and the prototypes.
  • Figure 3: The process of matching. We first obtain the co-occurrence matrix $\Gamma$ between target prototypes and seen classes. Then, we apply a column-wise softmax to the co-occurrence matrix $\Gamma$ to get the distribution matrix $D$. Finally, we apply a threshold $\tau$ to each $D_{i,j}$ to obtain the matching matrix $M$. $M_{i,j} = 1$ means the class $C_j$ is matched to the prototype $p_i$.
  • Figure 4: Confident samples for seen and unseen classes on VisDA. The synthetic images are from the source, and the real-world images are from the target.
  • Figure 5: Sensitivity to the threshold.$\tau$ is the original threshold provided by our method and the previous works. We modify $\tau$ by scaling it with a multiplication factor.
  • ...and 1 more figures