Table of Contents
Fetching ...

COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation

Xinghong Liu, Yi Zhou, Tao Zhou, Chun-Mei Feng, Ling Shao

TL;DR

The paper tackles source-free universal domain adaptation by shifting focus from adapting image encoders to calibrating the classifier of a vision-language model–powered few-shot learner using textual prototypes. It introduces ACTP to generate positive textual prototypes and negative image prototypes for self-training, and MIECI to enhance contextual mutual information via a masked-image pathway and an EMA teacher. The resulting COCA framework demonstrates state-of-the-art performance across OPDA, OSDA, and PDA on OfficeHome, VisDA-2017, and DomainNet while requiring only few-shot source data. This approach highlights that VLMs encode cross-domain knowledge, enabling robust adaptation with reduced labeling costs and broad applicability to zero-shot and few-shot settings.

Abstract

Universal domain adaptation (UniDA) aims to address domain and category shifts across data sources. Recently, due to more stringent data restrictions, researchers have introduced source-free UniDA (SF-UniDA). SF-UniDA methods eliminate the need for direct access to source samples when performing adaptation to the target domain. However, existing SF-UniDA methods still require an extensive quantity of labeled source samples to train a source model, resulting in significant labeling costs. To tackle this issue, we present a novel plug-and-play classifier-oriented calibration (COCA) method. COCA, which exploits textual prototypes, is designed for the source models based on few-shot learning with vision-language models (VLMs). It endows the VLM-powered few-shot learners, which are built for closed-set classification, with the unknown-aware ability to distinguish common and unknown classes in the SF-UniDA scenario. Crucially, COCA is a new paradigm to tackle SF-UniDA challenges based on VLMs, which focuses on classifier instead of image encoder optimization. Experiments show that COCA outperforms state-of-the-art UniDA and SF-UniDA models.

COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation

TL;DR

The paper tackles source-free universal domain adaptation by shifting focus from adapting image encoders to calibrating the classifier of a vision-language model–powered few-shot learner using textual prototypes. It introduces ACTP to generate positive textual prototypes and negative image prototypes for self-training, and MIECI to enhance contextual mutual information via a masked-image pathway and an EMA teacher. The resulting COCA framework demonstrates state-of-the-art performance across OPDA, OSDA, and PDA on OfficeHome, VisDA-2017, and DomainNet while requiring only few-shot source data. This approach highlights that VLMs encode cross-domain knowledge, enabling robust adaptation with reduced labeling costs and broad applicability to zero-shot and few-shot settings.

Abstract

Universal domain adaptation (UniDA) aims to address domain and category shifts across data sources. Recently, due to more stringent data restrictions, researchers have introduced source-free UniDA (SF-UniDA). SF-UniDA methods eliminate the need for direct access to source samples when performing adaptation to the target domain. However, existing SF-UniDA methods still require an extensive quantity of labeled source samples to train a source model, resulting in significant labeling costs. To tackle this issue, we present a novel plug-and-play classifier-oriented calibration (COCA) method. COCA, which exploits textual prototypes, is designed for the source models based on few-shot learning with vision-language models (VLMs). It endows the VLM-powered few-shot learners, which are built for closed-set classification, with the unknown-aware ability to distinguish common and unknown classes in the SF-UniDA scenario. Crucially, COCA is a new paradigm to tackle SF-UniDA challenges based on VLMs, which focuses on classifier instead of image encoder optimization. Experiments show that COCA outperforms state-of-the-art UniDA and SF-UniDA models.
Paper Structure (21 sections, 19 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 21 sections, 19 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) UniDA methods adapt both the image encoder and classifier module. (b) SF-UniDA methods freeze the classifier module and adapt the image encoder. (c) We leverage and freeze image and text encoders and adapt the closed-set classifier.
  • Figure 2: (a) Our method requires far fewer labeled source samples per class than traditional UniDA/SF-UniDA models. (b) Our plug-and-play method successfully adapts the VLM-powered few-shot learner cross-modal_adaptation to new target domains. (c) COCA exhibits more robustness against variations in the hyperparameter $K$ for K-means and outperforms the earlier SF-UniDA model GLC GLC.
  • Figure 3: Overview of the classifier-oriented calibration (COCA) method. COCA adapts the closed-set classifier $h_\theta$ to the target domain to tackle the SF-UniDA challenge.
  • Figure 4: Pipeline of the autonomous calibration via textual prototype (ACTP) module. horse and bike are the common classes, bus is the source-private class, and airplane is the unknown class. In Step 3, ACTP generates pseudo labels via \ref{['eq:generate_pseudo_label']}.
  • Figure 5: COCA at the inference phase.
  • ...and 3 more figures