Table of Contents
Fetching ...

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation

Haoran Chen, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

TL;DR

This work tackles multi-source unsupervised domain adaptation by moving away from a single shared feature extractor to a modular prompt-based approach built on CLIP. It introduces Multi-Prompt Alignment (MPA), which learns a separate prompt for each source–target pair, then denoises and aligns these prompts in a common latent space via an auto-encoder and an $L_1$ alignment loss. A complementary Latent Subspace Tuning (LST) strategy enables efficient adaptation to new target domains by navigating a learned low-dimensional subspace of prompts, significantly reducing training cost. Experiments on ImageCLEF, Office-Home, and DomainNet show state-of-the-art results with DomainNet achieving an average accuracy of $54.1\%$, while dramatically lowering the number of tunable parameters and enabling faster deployment in practice. Overall, MPA offers a scalable, efficient alternative for multi-source UDA and demonstrates the practical value of prompting large pre-trained vision-language models for domain adaptation tasks, with LST further enhancing adaptability to streamlined target sets.

Abstract

Most existing methods for unsupervised domain adaptation (UDA) rely on a shared network to extract domain-invariant features. However, when facing multiple source domains, optimizing such a network involves updating the parameters of the entire network, making it both computationally expensive and challenging, particularly when coupled with min-max objectives. Inspired by recent advances in prompt learning that adapts high-capacity models for downstream tasks in a computationally economic way, we introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA. Given a source and target domain pair, MPA first trains an individual prompt to minimize the domain gap through a contrastive loss. Then, MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts. Moreover, we show that the resulting subspace acquired from the auto-encoding process can easily generalize to a streamlined set of target domains, making our method more efficient for practical usage. Extensive experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation

TL;DR

This work tackles multi-source unsupervised domain adaptation by moving away from a single shared feature extractor to a modular prompt-based approach built on CLIP. It introduces Multi-Prompt Alignment (MPA), which learns a separate prompt for each source–target pair, then denoises and aligns these prompts in a common latent space via an auto-encoder and an alignment loss. A complementary Latent Subspace Tuning (LST) strategy enables efficient adaptation to new target domains by navigating a learned low-dimensional subspace of prompts, significantly reducing training cost. Experiments on ImageCLEF, Office-Home, and DomainNet show state-of-the-art results with DomainNet achieving an average accuracy of , while dramatically lowering the number of tunable parameters and enabling faster deployment in practice. Overall, MPA offers a scalable, efficient alternative for multi-source UDA and demonstrates the practical value of prompting large pre-trained vision-language models for domain adaptation tasks, with LST further enhancing adaptability to streamlined target sets.

Abstract

Most existing methods for unsupervised domain adaptation (UDA) rely on a shared network to extract domain-invariant features. However, when facing multiple source domains, optimizing such a network involves updating the parameters of the entire network, making it both computationally expensive and challenging, particularly when coupled with min-max objectives. Inspired by recent advances in prompt learning that adapts high-capacity models for downstream tasks in a computationally economic way, we introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA. Given a source and target domain pair, MPA first trains an individual prompt to minimize the domain gap through a contrastive loss. Then, MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts. Moreover, we show that the resulting subspace acquired from the auto-encoding process can easily generalize to a streamlined set of target domains, making our method more efficient for practical usage. Extensive experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
Paper Structure (24 sections, 10 equations, 3 figures, 7 tables)

This paper contains 24 sections, 10 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: (a) Most conventional multi-source UDA methods use a common feature extractor with domain-specific classifier heads, while we introduce prompt learning to multi-source UDA. (b) MPA outperforms all other multi-source UDA methods by a large margin on the DomainNet dataset with roughly one-third of tunable parameters. We also introduce an LST strategy for continuous adaptation to a streamlined set of target domains that further reduces the number of tunable parameters and still achieves high accuracy compared with MPA. See texts for more details.
  • Figure 2: Each source and target pair prompt $\bm{P}_i$ is the concatenation of a "source prompt" segment and a "target prompt" segment, both composed of domain-invariant and domain-specific features. Therefore, the size of $\bm{P}_i$ is $\mathbb{R}^{2K \times (M_1 + M_2) \times 512}$. During our prompt training step, the text encoder and the image encoder of CLIP are both frozen.
  • Figure 3: (a) Example of prompt alignment on the Office-Home dataset. Here, $\bm{P}_1, \bm{P}_2, \bm{P}_3$ are prompts for domain Ar-Rw, Cl-Rw and Pr-Rw respectively. All prompts are projected into the same latent space for alignment by an auto-encoder structure. (b) When facing a new target domain, tuning the latent subspace learned by the auto-encoder in MPA can allow quick adaptation that is more computationally efficient.