Plug-and-Play Transformer Modules for Test-Time Adaptation
Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury
TL;DR
PLUTO tackles the challenge of adapting large pre-trained transformers to many unseen, unlabeled test domains by pretraining a diverse store of parameter-efficient tuning (PET) modules on multiple source domains and loading them into a single transformer. A learnable module selector G adaptively weights and combines the outputs of multiple source modules in a target-specific, zero-/few-shot setting, forming logits $l(x) = \sum_j w(x)^j l(x)^j$ and selecting a sparse subset of modules (typically $\le 5$) for inference. The adaptation further tunes the LayerNorm affine parameters with entropy-minimization and employs sharpness-aware minimization to avoid collapse, improving stability and preventing forgetting. Evaluations on Digits-Five, Office-Home, CIFAR-10C, and ImageNet-C show that PLUTO consistently surpasses single-source TTA baselines and multi-source ensembles, especially in few-shot settings, while maintaining minimal parameter overhead. Overall, PLUTO provides a practical, edge-friendly paradigm for scalable domain adaptation by leveraging a modular, plug-and-play approach to transformer adaptation.
Abstract
Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.
