VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

Duy Nguyen; Dat Nguyen

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

Duy Nguyen, Dat Nguyen

TL;DR

VirDA tackles the inefficiency of traditional unsupervised domain adaptation by reusing a frozen backbone and introducing domain-specific visual reprogramming layers that prepend input prompts to shift style and texture toward a shared representation. It couples these prompts with domain-specific classifiers and a dual-objective training regime that enforces inter-domain alignment and intra-domain robustness without altering backbone weights. Empirical results across Digits, Office-31, and Office-Home demonstrate competitive accuracy with dramatically fewer trainable parameters and reduced storage, outperforming several PEFT and full-finetuning baselines in many settings. The work highlights the practicality of texture-aware prompting for cross-domain transfer and lays groundwork for extending the approach to other vision tasks.

Abstract

Existing UDA pipelines fine-tune already well-trained backbone parameters for every new source-and-target pair, resulting in the number of training parameters and storage memory growing linearly with each new pair, and also preventing the reuse of these well-trained backbone parameters. Inspired by recent implications that existing backbones have textural biases, we propose making use of domain-specific textural bias for domain adaptation via visual reprogramming, namely VirDA. Instead of fine-tuning the full backbone, VirDA prepends a domain-specific visual reprogramming layer to the backbone. This layer produces visual prompts that act as an added textural bias to the input image, adapting its "style" to a target domain. To optimize these visual reprogramming layers, we use multiple objective functions that optimize the intra- and inter-domain distribution differences when domain-adapting visual prompts are applied. This process does not require modifying the backbone parameters, allowing the same backbone to be reused across different domains. We evaluate VirDA on Office-31 and obtain 92.8% mean accuracy with only 1.5M trainable parameters. VirDA surpasses PDA, the state-of-the-art parameter-efficient UDA baseline, by +1.6% accuracy while using just 46% of its parameters. Compared with full-backbone fine-tuning, VirDA outperforms CDTrans and FixBi by +0.2% and +1.4%, respectively, while requiring only 1.7% and 2.8% of their trainable parameters. Relative to the strongest current methods (PMTrans and TVT), VirDA uses ~1.7% of their parameters and trades off only 2.2% and 1.1% accuracy, respectively.

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

TL;DR

Abstract

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)