Table of Contents
Fetching ...

Adapting to Distribution Shift by Visual Domain Prompt Generation

Zhixiang Chi, Li Gu, Tao Zhong, Huan Liu, Yuanhao Yu, Konstantinos N Plataniotis, Yang Wang

TL;DR

This work addresses the challenge of distribution shift under Few-Shot Test-Time Domain Adaptation by building adaptation on top of frozen foundation-model features. It introduces Visual Domain Prompt Generator (VDPG), which learns a shared knowledge bank across source domains and generates a domain-specific prompt conditioned on a small unlabeled target sample batch; a domain guidance module then fuses this prompt with the foundation model features to direct domain-aware predictions. The approach uses episodic meta-learning and a domain-aware contrastive loss to extract transferable domain knowledge while keeping the backbone fixed and enabling on-device, gradient-free adaptation. Empirical results on 5 large-scale benchmarks, including WILDS and DomainNet, show state-of-the-art performance and strong robustness to distribution shifts, with improved efficiency over finetuning-based methods. The work demonstrates practical, scalable domain specialization for foundation models in real-world deployment scenarios with limited target data.

Abstract

In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. Additionally, domain-centric designs are not flavored in their works. Furthermore, they employ the process of modelling source domains and the process of learning to adapt independently into disjoint training stages. In this work, we propose an approach on top of the pre-computed features of the foundation model. Specifically, we build a knowledge bank to learn the transferable knowledge from source domains. Conditioned on few-shot target data, we introduce a domain prompt generator to condense the knowledge bank into a domain-specific prompt. The domain prompt then directs the visual features towards a particular domain via a guidance module. Moreover, we propose a domain-aware contrastive loss and employ meta-learning to facilitate domain knowledge extraction. Extensive experiments are conducted to validate the domain knowledge extraction. The proposed method outperforms previous work on 5 large-scale benchmarks including WILDS and DomainNet.

Adapting to Distribution Shift by Visual Domain Prompt Generation

TL;DR

This work addresses the challenge of distribution shift under Few-Shot Test-Time Domain Adaptation by building adaptation on top of frozen foundation-model features. It introduces Visual Domain Prompt Generator (VDPG), which learns a shared knowledge bank across source domains and generates a domain-specific prompt conditioned on a small unlabeled target sample batch; a domain guidance module then fuses this prompt with the foundation model features to direct domain-aware predictions. The approach uses episodic meta-learning and a domain-aware contrastive loss to extract transferable domain knowledge while keeping the backbone fixed and enabling on-device, gradient-free adaptation. Empirical results on 5 large-scale benchmarks, including WILDS and DomainNet, show state-of-the-art performance and strong robustness to distribution shifts, with improved efficiency over finetuning-based methods. The work demonstrates practical, scalable domain specialization for foundation models in real-world deployment scenarios with limited target data.

Abstract

In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. Additionally, domain-centric designs are not flavored in their works. Furthermore, they employ the process of modelling source domains and the process of learning to adapt independently into disjoint training stages. In this work, we propose an approach on top of the pre-computed features of the foundation model. Specifically, we build a knowledge bank to learn the transferable knowledge from source domains. Conditioned on few-shot target data, we introduce a domain prompt generator to condense the knowledge bank into a domain-specific prompt. The domain prompt then directs the visual features towards a particular domain via a guidance module. Moreover, we propose a domain-aware contrastive loss and employ meta-learning to facilitate domain knowledge extraction. Extensive experiments are conducted to validate the domain knowledge extraction. The proposed method outperforms previous work on 5 large-scale benchmarks including WILDS and DomainNet.
Paper Structure (25 sections, 5 equations, 10 figures, 15 tables, 1 algorithm)

This paper contains 25 sections, 5 equations, 10 figures, 15 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of training pipeline of VDPG. Two disjoint support and query sets are sampled from a training domain. The support set is passed to a domain prompt generator to condense the learned knowledge bank into a domain-specific prompt. The generated prompt is then evaluated on the query set by guiding their feature via a guidance module. Noted, the image/prompt with the same colour belongs to the same domain.
  • Figure 2: a-b) t-SNE feature visualization of before and after guidance. c-d) Comparison among generated domain prompts on 48 target domains in iWildCam.
  • Figure 3: Swapping the generated domain prompts with various similarity for domain #40 and #47.
  • Figure 4: Comparison among different pairs of instances. L2 distance between generated prompts from a) samples of class 24 but from different domains; b) Samples all from domain 8 but with different classes.
  • Figure 5: Correlation among each pairs of $\textbf{b}_z$ in B.$\textbf{B}$ is adopted after training on iWildCam dataset with $Z=100$.
  • ...and 5 more figures