Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning
Dongjie Chen, Kartik Patwari, Zhengfeng Lai, Xiaoguang Zhu, Sen-ching Cheung, Chen-Nee Chuah
TL;DR
This work tackles source-free domain adaptation by leveraging multiple frozen multimodal large language models (MLLMs) as diverse teachers. It introduces Reliability-based Curriculum Learning (RCL), which converts open-ended MLLM outputs into pseudo-labels via Semantic Textual Similarity (STS) and estimates pseudo-label reliability with a consensus-based score, partitioning target data into reliable, less reliable, and unreliably labeled sets. RCL then trains a lightweight student through three stages—Reliable Knowledge Transfer (RKT), Self-correcting and MLLM-guided Knowledge Expansion (SMKE), and Multi-hot Masking Refinement (MMR)—to maximize information from all target samples while mitigating noise. The approach achieves state-of-the-art results on Office-Home, DomainNet-126, and VisDA-C, while reducing model size by up to several orders of magnitude and avoiding source-data access or foundation-model fine-tuning. Overall, RCL demonstrates how foundation-model supervision can be distilled into practical, efficient SFDA models with broad applicability and robustness to teacher quality.
Abstract
Existing SFDA methods struggle to fully use pre-trained knowledge and often rely on a single model's predictions or handcrafted prompts, limiting robustness under domain shift. Multimodal Large Language Models (MLLMs) offer a promising alternative: they encode rich visual-semantic knowledge and generalize well without task-specific tuning. However, their use in SFDA is hindered by instruction-following failures, inconsistent outputs, and high inference costs. We propose Reliability-based Curriculum Learning (RCL), a novel framework that distills robust supervision from multiple frozen MLLMs into a compact target model. RCL organizes adaptation as a three-stage curriculum that progressively incorporates pseudo-labels based on inter-model agreement and model confidence, enabling stable and noise-aware training. Our approach achieves state-of-the-art performance on standard SFDA datasets, Office-Home, DomainNet-126, and VisDA-C, outperforming zero-shot MLLMs, their ensembles, all without accessing source data or tuning foundation models. Our code is available at: https://github.com/Dong-Jie-Chen/RCL.
