Table of Contents
Fetching ...

Plug-and-Play Transformer Modules for Test-Time Adaptation

Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

TL;DR

PLUTO tackles the challenge of adapting large pre-trained transformers to many unseen, unlabeled test domains by pretraining a diverse store of parameter-efficient tuning (PET) modules on multiple source domains and loading them into a single transformer. A learnable module selector G adaptively weights and combines the outputs of multiple source modules in a target-specific, zero-/few-shot setting, forming logits $l(x) = \sum_j w(x)^j l(x)^j$ and selecting a sparse subset of modules (typically $\le 5$) for inference. The adaptation further tunes the LayerNorm affine parameters with entropy-minimization and employs sharpness-aware minimization to avoid collapse, improving stability and preventing forgetting. Evaluations on Digits-Five, Office-Home, CIFAR-10C, and ImageNet-C show that PLUTO consistently surpasses single-source TTA baselines and multi-source ensembles, especially in few-shot settings, while maintaining minimal parameter overhead. Overall, PLUTO provides a practical, edge-friendly paradigm for scalable domain adaptation by leveraging a modular, plug-and-play approach to transformer adaptation.

Abstract

Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.

Plug-and-Play Transformer Modules for Test-Time Adaptation

TL;DR

PLUTO tackles the challenge of adapting large pre-trained transformers to many unseen, unlabeled test domains by pretraining a diverse store of parameter-efficient tuning (PET) modules on multiple source domains and loading them into a single transformer. A learnable module selector G adaptively weights and combines the outputs of multiple source modules in a target-specific, zero-/few-shot setting, forming logits and selecting a sparse subset of modules (typically ) for inference. The adaptation further tunes the LayerNorm affine parameters with entropy-minimization and employs sharpness-aware minimization to avoid collapse, improving stability and preventing forgetting. Evaluations on Digits-Five, Office-Home, CIFAR-10C, and ImageNet-C show that PLUTO consistently surpasses single-source TTA baselines and multi-source ensembles, especially in few-shot settings, while maintaining minimal parameter overhead. Overall, PLUTO provides a practical, edge-friendly paradigm for scalable domain adaptation by leveraging a modular, plug-and-play approach to transformer adaptation.

Abstract

Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting 5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.
Paper Structure (27 sections, 12 equations, 2 figures, 13 tables, 2 algorithms)

This paper contains 27 sections, 12 equations, 2 figures, 13 tables, 2 algorithms.

Figures (2)

  • Figure 1: The overview of PLUTO. At test time, PLUTO efficiently combines the sources using appropriate weights determined by the current test distribution. Furthermore, we selectively update the LayerNorm (LN) parameters of the model that demonstrates the highest correlation with the test distribution. The numbers in the figure are examples provided for illustrative purposes.
  • Figure A: Examples of each corruption type in the image corruptions benchmark. While synthetic, this set of corruptions aims to represent natural factors of variation like noise, blur, weather, and digital imaging effects.