Table of Contents
Fetching ...

Collaborative Learning with Multiple Foundation Models for Source-Free Domain Adaptation

Huisoo Lee, Jisu Han, Hyunsouk Cho, Wonjun Hwang

TL;DR

A bidirectional adaptation mechanism that aligns different FMs with the target model for task adaptation while maintaining their semantic distinctiveness, and transfers complementary knowledge from the FMs to the target model is employed.

Abstract

Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain without access to source data. Recent advances in Foundation Models (FMs) have introduced new opportunities for leveraging external semantic knowledge to guide SFDA. However, relying on a single FM is often insufficient, as it tends to bias adaptation toward a restricted semantic coverage, failing to capture diverse contextual cues under domain shift. To overcome this limitation, we propose a Collaborative Multi-foundation Adaptation (CoMA) framework that jointly leverages two different FMs (e.g., CLIP and BLIP) with complementary properties to capture both global semantics and local contextual cues. Specifically, we employ a bidirectional adaptation mechanism that (1) aligns different FMs with the target model for task adaptation while maintaining their semantic distinctiveness, and (2) transfers complementary knowledge from the FMs to the target model. To ensure stable adaptation under mini-batch training, we introduce Decomposed Mutual Information (DMI) that selectively enhances true dependencies while suppressing false dependencies arising from incomplete class coverage. Extensive experiments demonstrate that our method consistently outperforms existing state-of-the-art SFDA methods across four benchmarks, including Office-31, Office-Home, DomainNet-126, and VisDA, under the closed-set setting, while also achieving best results on partial-set and open-set variants.

Collaborative Learning with Multiple Foundation Models for Source-Free Domain Adaptation

TL;DR

A bidirectional adaptation mechanism that aligns different FMs with the target model for task adaptation while maintaining their semantic distinctiveness, and transfers complementary knowledge from the FMs to the target model is employed.

Abstract

Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain without access to source data. Recent advances in Foundation Models (FMs) have introduced new opportunities for leveraging external semantic knowledge to guide SFDA. However, relying on a single FM is often insufficient, as it tends to bias adaptation toward a restricted semantic coverage, failing to capture diverse contextual cues under domain shift. To overcome this limitation, we propose a Collaborative Multi-foundation Adaptation (CoMA) framework that jointly leverages two different FMs (e.g., CLIP and BLIP) with complementary properties to capture both global semantics and local contextual cues. Specifically, we employ a bidirectional adaptation mechanism that (1) aligns different FMs with the target model for task adaptation while maintaining their semantic distinctiveness, and (2) transfers complementary knowledge from the FMs to the target model. To ensure stable adaptation under mini-batch training, we introduce Decomposed Mutual Information (DMI) that selectively enhances true dependencies while suppressing false dependencies arising from incomplete class coverage. Extensive experiments demonstrate that our method consistently outperforms existing state-of-the-art SFDA methods across four benchmarks, including Office-31, Office-Home, DomainNet-126, and VisDA, under the closed-set setting, while also achieving best results on partial-set and open-set variants.

Paper Structure

This paper contains 18 sections, 1 theorem, 22 equations, 8 figures, 13 tables, 1 algorithm.

Key Result

Proposition 1

For any candidate class subset $\mathcal{S}$ satisfying $|\mathcal{S}| \ge 2$ and $|\mathcal{S}^{\complement}| \ge 2$, the Decomposed Mutual Information $I_D(X;Y)$ satisfies the following bounded condition:

Figures (8)

  • Figure 1: While prior SFDA methods typically employ a single FM, we jointly leverage two different FMs with complementary semantic properties. The visualization illustrates how FM I (e.g., CLIP) captures global, category-level semantics (e.g., "tram"), whereas FM II (e.g., BLIP) captures local contextual semantics with fine-grained details (e.g., "wheels") clipinstblipclip_semantic1clip_semantic2. The target model guides FMs toward task-relevant semantics and then obtains their refined signals for improving its prediction.
  • Figure 2: Overview of CoMA. Our method begins with a burn-in phase that trains a BLIP-proxy model initialized from the source model. After this phase the target model is also initialized from the source model. It then undergoes two bidirectional stages: (1) TCA aligns two complementary MFMs with task-relevant semantics while preserving their semantic distinctness, and (2) MDA transfers reliable knowledge from MFMs into the target model. $H(p,q)$ denotes cross-entropy.
  • Figure 3: Concept of our DMI method. Conventional MI maximization reinforces dependencies for all class pairs including those absent ($\mathcal{S}^{\complement}$: Plane, Bicycle) from the batch, causing false dependencies. In contrast, our DMI maximization enhances true dependencies within the confident joint region formed by classes present ($\mathcal{S}$: Bus, Car, Truck) in the batch, while suppressing false ones in the uncertain region with absent classes $\mathcal{S}^{\complement}$.
  • Figure 4: t-SNE feature visualization on the transfer task Cl$\to$Ar in Office-Home. Each color corresponds to one of the 65 object categories.
  • Figure 5: Batch size sensitivity of DMI on the transfer task Cl$\to$Ar in Office-Home. Left: ProDe-V prode. Right: Our CoMA. DMI maintains stable performance across batch sizes, effectively mitigating the degradation observed with the standard MI.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1
  • Proposition 1