Table of Contents
Fetching ...

Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning

Dongjie Chen, Kartik Patwari, Zhengfeng Lai, Xiaoguang Zhu, Sen-ching Cheung, Chen-Nee Chuah

TL;DR

This work tackles source-free domain adaptation by leveraging multiple frozen multimodal large language models (MLLMs) as diverse teachers. It introduces Reliability-based Curriculum Learning (RCL), which converts open-ended MLLM outputs into pseudo-labels via Semantic Textual Similarity (STS) and estimates pseudo-label reliability with a consensus-based score, partitioning target data into reliable, less reliable, and unreliably labeled sets. RCL then trains a lightweight student through three stages—Reliable Knowledge Transfer (RKT), Self-correcting and MLLM-guided Knowledge Expansion (SMKE), and Multi-hot Masking Refinement (MMR)—to maximize information from all target samples while mitigating noise. The approach achieves state-of-the-art results on Office-Home, DomainNet-126, and VisDA-C, while reducing model size by up to several orders of magnitude and avoiding source-data access or foundation-model fine-tuning. Overall, RCL demonstrates how foundation-model supervision can be distilled into practical, efficient SFDA models with broad applicability and robustness to teacher quality.

Abstract

Existing SFDA methods struggle to fully use pre-trained knowledge and often rely on a single model's predictions or handcrafted prompts, limiting robustness under domain shift. Multimodal Large Language Models (MLLMs) offer a promising alternative: they encode rich visual-semantic knowledge and generalize well without task-specific tuning. However, their use in SFDA is hindered by instruction-following failures, inconsistent outputs, and high inference costs. We propose Reliability-based Curriculum Learning (RCL), a novel framework that distills robust supervision from multiple frozen MLLMs into a compact target model. RCL organizes adaptation as a three-stage curriculum that progressively incorporates pseudo-labels based on inter-model agreement and model confidence, enabling stable and noise-aware training. Our approach achieves state-of-the-art performance on standard SFDA datasets, Office-Home, DomainNet-126, and VisDA-C, outperforming zero-shot MLLMs, their ensembles, all without accessing source data or tuning foundation models. Our code is available at: https://github.com/Dong-Jie-Chen/RCL.

Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning

TL;DR

This work tackles source-free domain adaptation by leveraging multiple frozen multimodal large language models (MLLMs) as diverse teachers. It introduces Reliability-based Curriculum Learning (RCL), which converts open-ended MLLM outputs into pseudo-labels via Semantic Textual Similarity (STS) and estimates pseudo-label reliability with a consensus-based score, partitioning target data into reliable, less reliable, and unreliably labeled sets. RCL then trains a lightweight student through three stages—Reliable Knowledge Transfer (RKT), Self-correcting and MLLM-guided Knowledge Expansion (SMKE), and Multi-hot Masking Refinement (MMR)—to maximize information from all target samples while mitigating noise. The approach achieves state-of-the-art results on Office-Home, DomainNet-126, and VisDA-C, while reducing model size by up to several orders of magnitude and avoiding source-data access or foundation-model fine-tuning. Overall, RCL demonstrates how foundation-model supervision can be distilled into practical, efficient SFDA models with broad applicability and robustness to teacher quality.

Abstract

Existing SFDA methods struggle to fully use pre-trained knowledge and often rely on a single model's predictions or handcrafted prompts, limiting robustness under domain shift. Multimodal Large Language Models (MLLMs) offer a promising alternative: they encode rich visual-semantic knowledge and generalize well without task-specific tuning. However, their use in SFDA is hindered by instruction-following failures, inconsistent outputs, and high inference costs. We propose Reliability-based Curriculum Learning (RCL), a novel framework that distills robust supervision from multiple frozen MLLMs into a compact target model. RCL organizes adaptation as a three-stage curriculum that progressively incorporates pseudo-labels based on inter-model agreement and model confidence, enabling stable and noise-aware training. Our approach achieves state-of-the-art performance on standard SFDA datasets, Office-Home, DomainNet-126, and VisDA-C, outperforming zero-shot MLLMs, their ensembles, all without accessing source data or tuning foundation models. Our code is available at: https://github.com/Dong-Jie-Chen/RCL.
Paper Structure (33 sections, 8 equations, 10 figures, 18 tables, 1 algorithm)

This paper contains 33 sections, 8 equations, 10 figures, 18 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparisons with existing methods, MLLMs (zero-shot with proposed STS), and RCL on OfficeHome dataset. RCL achieves SOTA results across domains while being lightweight.
  • Figure 2: Directly prompting MLLMs for classification can lead to failures: we propose semantic textual similarity (STS) in Section \ref{['subsec:sts']}.
  • Figure 3: Pseudo-label accuracy and distribution across MLLMs in Office-Home (a) Clipart and (b) Art domains (65 classes each).
  • Figure 4: Overview of our Reliability-based Curriculum Learning (RCL) framework. RCL applies curriculum learning over target data by progressively incorporating MLLM pseudo-labels based on reliability. (1) RKT uses high-confidence samples from full MLLM agreement to initialize feature learning. (2) SMKE integrates partially agreed labels to expand knowledge and correct early errors. (3) MMR learns from uncertain samples using multi-hot masking and consistency loss. This structured curriculum enables full data use while reducing label noise.
  • Figure 5: t-SNE feature distribution for A$\rightarrow$C in Office-Home. DIFO-C-B32 uses ViT-B32; others use ResNet-50.
  • ...and 5 more figures