Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
Haoran Chen, Xintong Han, Zuxuan Wu, Yu-Gang Jiang
TL;DR
This work tackles multi-source unsupervised domain adaptation by moving away from a single shared feature extractor to a modular prompt-based approach built on CLIP. It introduces Multi-Prompt Alignment (MPA), which learns a separate prompt for each source–target pair, then denoises and aligns these prompts in a common latent space via an auto-encoder and an $L_1$ alignment loss. A complementary Latent Subspace Tuning (LST) strategy enables efficient adaptation to new target domains by navigating a learned low-dimensional subspace of prompts, significantly reducing training cost. Experiments on ImageCLEF, Office-Home, and DomainNet show state-of-the-art results with DomainNet achieving an average accuracy of $54.1\%$, while dramatically lowering the number of tunable parameters and enabling faster deployment in practice. Overall, MPA offers a scalable, efficient alternative for multi-source UDA and demonstrates the practical value of prompting large pre-trained vision-language models for domain adaptation tasks, with LST further enhancing adaptability to streamlined target sets.
Abstract
Most existing methods for unsupervised domain adaptation (UDA) rely on a shared network to extract domain-invariant features. However, when facing multiple source domains, optimizing such a network involves updating the parameters of the entire network, making it both computationally expensive and challenging, particularly when coupled with min-max objectives. Inspired by recent advances in prompt learning that adapts high-capacity models for downstream tasks in a computationally economic way, we introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA. Given a source and target domain pair, MPA first trains an individual prompt to minimize the domain gap through a contrastive loss. Then, MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts. Moreover, we show that the resulting subspace acquired from the auto-encoding process can easily generalize to a streamlined set of target domains, making our method more efficient for practical usage. Extensive experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
