Table of Contents
Fetching ...

Prompt-based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou, Siteng Huang, Zhirong Luan, Donglin Wang, Badong Chen

TL;DR

This work investigates unsupervised domain adaptation (UDA) using vision-language models (VLMs) by introducing Prompt-based Distribution Alignment (PDA). PDA employs a two-branch design: a base branch that uses multi-modal prompt tuning to produce discriminative representations, and an alignment branch that builds source/target feature banks and applies image-guided feature tuning (IFT) to reduce domain discrepancy. The method optimizes a combined contrastive objective with both source and pseudo-labeled target data, achieving state-of-the-art performance on Office-Home, Office-31, and VisDA-2017 while maintaining efficiency through prompt-based adaptation. The results demonstrate that domain-aware prompt learning, coupled with feature-bank–guided alignment, yields robust cross-domain transfer with practical implications for real-world UDA tasks.

Abstract

Recently, despite the unprecedented success of large pre-trained visual-language models (VLMs) on a wide range of downstream tasks, the real-world unsupervised domain adaptation (UDA) problem is still not well explored. Therefore, in this paper, we first experimentally demonstrate that the unsupervised-trained VLMs can significantly reduce the distribution discrepancy between source and target domains, thereby improving the performance of UDA. However, a major challenge for directly deploying such models on downstream UDA tasks is prompt engineering, which requires aligning the domain knowledge of source and target domains, since the performance of UDA is severely influenced by a good domain-invariant representation. We further propose a Prompt-based Distribution Alignment (PDA) method to incorporate the domain knowledge into prompt learning. Specifically, PDA employs a two-branch prompt-tuning paradigm, namely base branch and alignment branch. The base branch focuses on integrating class-related representation into prompts, ensuring discrimination among different classes. To further minimize domain discrepancy, for the alignment branch, we construct feature banks for both the source and target domains and propose image-guided feature tuning (IFT) to make the input attend to feature banks, which effectively integrates self-enhanced and cross-domain features into the model. In this way, these two branches can be mutually promoted to enhance the adaptation of VLMs for UDA. We conduct extensive experiments on three benchmarks to demonstrate that our proposed PDA achieves state-of-the-art performance. The code is available at https://github.com/BaiShuanghao/Prompt-based-Distribution-Alignment.

Prompt-based Distribution Alignment for Unsupervised Domain Adaptation

TL;DR

This work investigates unsupervised domain adaptation (UDA) using vision-language models (VLMs) by introducing Prompt-based Distribution Alignment (PDA). PDA employs a two-branch design: a base branch that uses multi-modal prompt tuning to produce discriminative representations, and an alignment branch that builds source/target feature banks and applies image-guided feature tuning (IFT) to reduce domain discrepancy. The method optimizes a combined contrastive objective with both source and pseudo-labeled target data, achieving state-of-the-art performance on Office-Home, Office-31, and VisDA-2017 while maintaining efficiency through prompt-based adaptation. The results demonstrate that domain-aware prompt learning, coupled with feature-bank–guided alignment, yields robust cross-domain transfer with practical implications for real-world UDA tasks.

Abstract

Recently, despite the unprecedented success of large pre-trained visual-language models (VLMs) on a wide range of downstream tasks, the real-world unsupervised domain adaptation (UDA) problem is still not well explored. Therefore, in this paper, we first experimentally demonstrate that the unsupervised-trained VLMs can significantly reduce the distribution discrepancy between source and target domains, thereby improving the performance of UDA. However, a major challenge for directly deploying such models on downstream UDA tasks is prompt engineering, which requires aligning the domain knowledge of source and target domains, since the performance of UDA is severely influenced by a good domain-invariant representation. We further propose a Prompt-based Distribution Alignment (PDA) method to incorporate the domain knowledge into prompt learning. Specifically, PDA employs a two-branch prompt-tuning paradigm, namely base branch and alignment branch. The base branch focuses on integrating class-related representation into prompts, ensuring discrimination among different classes. To further minimize domain discrepancy, for the alignment branch, we construct feature banks for both the source and target domains and propose image-guided feature tuning (IFT) to make the input attend to feature banks, which effectively integrates self-enhanced and cross-domain features into the model. In this way, these two branches can be mutually promoted to enhance the adaptation of VLMs for UDA. We conduct extensive experiments on three benchmarks to demonstrate that our proposed PDA achieves state-of-the-art performance. The code is available at https://github.com/BaiShuanghao/Prompt-based-Distribution-Alignment.
Paper Structure (26 sections, 9 equations, 6 figures, 11 tables)

This paper contains 26 sections, 9 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Metric comparisons on Office-Home. Higher values are better. $r$ measures the compactness of features (i.e., the division of inner-class $L_2$ distance and inter-class $L_2$ distance $L^{inter}_2$). MMD and KL divergence measure the domain discrepancy. $T$, $I_s$ and $I_t$ denote the text features, and image features of the source and target domain, respectively. Our method demonstrates the most discriminable text features, the most compact image features, the lowest domain discrepancy, and the best accuracy.
  • Figure 2: Overview of the proposed Prompt-based Distribution Alignment (PDA) method. The snow denotes the frozen parameters, and the fire denotes the learnable parameters. From left to right, we respectively show the detailed framework of PDA and the architecture of the IFT module. We mainly adopt the multi-modal prompt tuning in our PDA method. Additionally, IFT module makes the visual features attend to the source/target-domain feature bank for domain alignment.
  • Figure 3: The t-SNE visualization for different tasks on the three datasets with zero-shot CLIP, MaPLe and our PDA method. Image features extracted from the source and target domain are shown in blue and red, respectively.
  • Figure 4: Sensitivity analysis of the context token length (left) and pseudo label threshold $\tau$ (right) on three datasets.
  • Figure 5: Performance comparison on Office-Home dataset. With much fewer parameters, our PDA method outperforms all other UDA methods
  • ...and 1 more figures