EUDA: An Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer
Ali Abedi, Q. M. Jonathan Wu, Ning Zhang, Farhad Pourpanah
TL;DR
This paper tackles the inefficiency of state-of-the-art unsupervised domain adaptation methods by proposing EUDA, which employs a frozen DINOv2 self-supervised vision transformer as a feature extractor and a compact fully connected bottleneck. It introduces the Synergistic Domain Alignment Loss (SDAL), a weighted combination of cross-entropy and maximum mean discrepancy losses, to jointly minimize source classification errors and align source-target feature distributions. Empirical results across Office-31, Office-Home, VisDA-2017, and DomainNet show that EUDA achieves competitive or superior accuracy while dramatically reducing trainable parameters (up to 99.7% fewer in DomainNet), highlighting strong potential for resource-constrained settings. The work demonstrates the practicality of using self-supervised ViT backbones for efficient domain adaptation and suggests broader applications in on-edge environments and safety-critical domains.
Abstract
Unsupervised domain adaptation (UDA) aims to mitigate the domain shift issue, where the distribution of training (source) data differs from that of testing (target) data. Many models have been developed to tackle this problem, and recently vision transformers (ViTs) have shown promising results. However, the complexity and large number of trainable parameters of ViTs restrict their deployment in practical applications. This underscores the need for an efficient model that not only reduces trainable parameters but also allows for adjustable complexity based on specific needs while delivering comparable performance. To achieve this, in this paper we introduce an Efficient Unsupervised Domain Adaptation (EUDA) framework. EUDA employs the DINOv2, which is a self-supervised ViT, as a feature extractor followed by a simplified bottleneck of fully connected layers to refine features for enhanced domain adaptation. Additionally, EUDA employs the synergistic domain alignment loss (SDAL), which integrates cross-entropy (CE) and maximum mean discrepancy (MMD) losses, to balance adaptation by minimizing classification errors in the source domain while aligning the source and target domain distributions. The experimental results indicate the effectiveness of EUDA in producing comparable results as compared with other state-of-the-art methods in domain adaptation with significantly fewer trainable parameters, between 42% to 99.7% fewer. This showcases the ability to train the model in a resource-limited environment. The code of the model is available at: https://github.com/A-Abedi/EUDA.
