Table of Contents
Fetching ...

Consistent Assistant Domains Transformer for Source-free Domain Adaptation

Renrong Shao, Wei Zhang, Kangyang Luo, Qin Li, and Jun Wang

TL;DR

The paper tackles source-free domain adaptation by introducing CADTrans, which constructs a plug-in assistant domain (ADM) from aggregated global attention in a Vision Transformer to derive invariant features. It then employs domain consistency strategies to separate easy source-like samples from hard target-specific samples and uses conditional multi-kernel MMD (CMK-MMD) to align hard to easy samples, improving SFDA robustness. The approach achieves strong performance gains on Office-31, Office-Home, VISDA-C, and DomainNet-126, demonstrating the value of an assistant domain, self-distillation, and sample-wise alignment in transformer-based SFDA. This work offers a practical, scalable framework for reducing domain shift without accessing source data, with potential applicability to other vision tasks and future lightweight adaptations.

Abstract

Source-free domain adaptation (SFDA) aims to address the challenge of adapting to a target domain without accessing the source domain directly. However, due to the inaccessibility of source domain data, deterministic invariable features cannot be obtained. Current mainstream methods primarily focus on evaluating invariant features in the target domain that closely resemble those in the source domain, subsequently aligning the target domain with the source domain. However, these methods are susceptible to hard samples and influenced by domain bias. In this paper, we propose a Consistent Assistant Domains Transformer for SFDA, abbreviated as CADTrans, which solves the issue by constructing invariable feature representations of domain consistency. Concretely, we develop an assistant domain module for CADTrans to obtain diversified representations from the intermediate aggregated global attentions, which addresses the limitation of existing methods in adequately representing diversity. Based on assistant and target domains, invariable feature representations are obtained by multiple consistent strategies, which can be used to distinguish easy and hard samples. Finally, to align the hard samples to the corresponding easy samples, we construct a conditional multi-kernel max mean discrepancy (CMK-MMD) strategy to distinguish between samples of the same category and those of different categories. Extensive experiments are conducted on various benchmarks such as Office-31, Office-Home, VISDA-C, and DomainNet-126, proving the significant performance improvements achieved by our proposed approaches. Code is available at https://github.com/RoryShao/CADTrans.git.

Consistent Assistant Domains Transformer for Source-free Domain Adaptation

TL;DR

The paper tackles source-free domain adaptation by introducing CADTrans, which constructs a plug-in assistant domain (ADM) from aggregated global attention in a Vision Transformer to derive invariant features. It then employs domain consistency strategies to separate easy source-like samples from hard target-specific samples and uses conditional multi-kernel MMD (CMK-MMD) to align hard to easy samples, improving SFDA robustness. The approach achieves strong performance gains on Office-31, Office-Home, VISDA-C, and DomainNet-126, demonstrating the value of an assistant domain, self-distillation, and sample-wise alignment in transformer-based SFDA. This work offers a practical, scalable framework for reducing domain shift without accessing source data, with potential applicability to other vision tasks and future lightweight adaptations.

Abstract

Source-free domain adaptation (SFDA) aims to address the challenge of adapting to a target domain without accessing the source domain directly. However, due to the inaccessibility of source domain data, deterministic invariable features cannot be obtained. Current mainstream methods primarily focus on evaluating invariant features in the target domain that closely resemble those in the source domain, subsequently aligning the target domain with the source domain. However, these methods are susceptible to hard samples and influenced by domain bias. In this paper, we propose a Consistent Assistant Domains Transformer for SFDA, abbreviated as CADTrans, which solves the issue by constructing invariable feature representations of domain consistency. Concretely, we develop an assistant domain module for CADTrans to obtain diversified representations from the intermediate aggregated global attentions, which addresses the limitation of existing methods in adequately representing diversity. Based on assistant and target domains, invariable feature representations are obtained by multiple consistent strategies, which can be used to distinguish easy and hard samples. Finally, to align the hard samples to the corresponding easy samples, we construct a conditional multi-kernel max mean discrepancy (CMK-MMD) strategy to distinguish between samples of the same category and those of different categories. Extensive experiments are conducted on various benchmarks such as Office-31, Office-Home, VISDA-C, and DomainNet-126, proving the significant performance improvements achieved by our proposed approaches. Code is available at https://github.com/RoryShao/CADTrans.git.

Paper Structure

This paper contains 18 sections, 11 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: Principle of our proposed methodology. The Left: Easy samples of target domain are more similar to the source data, while hard samples have great discrepancies. The Right: Construct an assistant domain to obtain invariant features by domain consistency strategies to evaluate easy samples and align hard samples.
  • Figure 2: The overall workflow of the proposed CADTrans. In the initial stage, CADTrans undergoes training and distillation within the source domain, where the attention features from each layer are aggregated through EMA to generate global attention. The ADM block $\mathcal{G}_{s}$ is the trainable module by distillation of the output of classifier $\mathcal{C}_{s}$.
  • Figure 3: The second stage involves target adaptation. We first evaluate the pseudo-labels from both target domain and assistant domain to distinguish easy samples (dark color) and hard samples (light color) by consistency strategies. Then, we store the easy and hard samples in the memory bank respectively and reassess the hard samples by consistent neighbors. Finally, we exploit CMK-MMD to align the hard samples and easy samples.
  • Figure 4: Attention maps of the intermediate layers in CADTrans (ViT-B) model. The Right: the right of picture is attention map of each layer. The Left: the left of picture are original image of the sample and the final global attention map aggregated by our approaches.
  • Figure 5: Attention maps of images about desk chair, calculator, and black package in the Office-31 dataset.
  • ...and 2 more figures