A Tale of Two Experts: Cooperative Learning for Source-Free Unsupervised Domain Adaptation
Jiaping Yu, Muli Yang, Jiapeng Ji, Jiexi Yan, Cheng Deng
TL;DR
Source-Free Unsupervised Domain Adaptation (SFUDA) requires adapting a model trained on a labeled source domain while the source data is unavailable. The paper proposes Experts Cooperative Learning (EXCL), which combines a Dual Experts framework (frozen source model with Conv-Adapter and a Vision-Language Model with a trainable text prompt) with a Retrieval-Augmented-Interaction (RAIN) pipeline to learn from unlabeled target data. It introduces specific losses, including a Weiszfeld-style loss for style alignment ($L_{weisz}$), a Prompt Semantic Consistency loss ($L_{psc}$), and a Mutual Information loss ($L_{mi}$), to enable unsupervised cooperation between the two experts. Empirical results on four benchmarks show EXCL achieving state-of-the-art or competitive performance across SFUDA settings, demonstrating the value of cooperative, data-retrieval–driven adaptation without source data.
Abstract
Source-Free Unsupervised Domain Adaptation (SFUDA) addresses the realistic challenge of adapting a source-trained model to a target domain without access to the source data, driven by concerns over privacy and cost. Existing SFUDA methods either exploit only the source model's predictions or fine-tune large multimodal models, yet both neglect complementary insights and the latent structure of target data. In this paper, we propose the Experts Cooperative Learning (EXCL). EXCL contains the Dual Experts framework and Retrieval-Augmentation-Interaction optimization pipeline. The Dual Experts framework places a frozen source-domain model (augmented with Conv-Adapter) and a pretrained vision-language model (with a trainable text prompt) on equal footing to mine consensus knowledge from unlabeled target samples. To effectively train these plug-in modules under purely unsupervised conditions, we introduce Retrieval-Augmented-Interaction(RAIN), a three-stage pipeline that (1) collaboratively retrieves pseudo-source and complex target samples, (2) separately fine-tunes each expert on its respective sample set, and (3) enforces learning object consistency via a shared learning result. Extensive experiments on four benchmark datasets demonstrate that our approach matches state-of-the-art performance.
