Adapters Strike Back

Jan-Martin O. Steitz; Stefan Roth

Adapters Strike Back

Jan-Martin O. Steitz, Stefan Roth

TL;DR

Adapters can powerfully adapt vision transformers with minimal parameter overhead, but prior results were inconsistent due to implementation choices. This work systematically analyzes adapter positions, inner structure, initializations, and data normalization, introducing Adapter+ with a learnable channel-wise scaling and Post-Adapter placement. Adapter+ achieves state-of-the-art VTAB performance (77.6% avg) without per-task hyperparameter optimization and reaches 90.7% on FGVC with a small parameter budget, outperforming more complex methods. The findings offer practical guidance for robust, scalable transfer learning of ViTs across diverse visual tasks and pretraining regimes.

Abstract

Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However, they have often been found to be outperformed by other adaptation mechanisms, including low-rank adaptation. In this paper, we provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We uncover pitfalls for using adapters and suggest a concrete, improved adapter architecture, called Adapter+, that not only outperforms previous adapter implementations but surpasses a number of other, more complex adaptation mechanisms in several challenging settings. Despite this, our suggested adapter is highly robust and, unlike previous work, requires little to no manual intervention when addressing a novel scenario. Adapter+ reaches state-of-the-art average accuracy on the VTAB benchmark, even without a per-task hyperparameter optimization.

Adapters Strike Back

TL;DR

Abstract

Paper Structure (24 sections, 12 equations, 3 figures, 14 tables)

This paper contains 24 sections, 12 equations, 3 figures, 14 tables.

Introduction
Related work
Adapters for vision transformers
Vision transformer basics
Adapters and their inner structure
Adapter positions
Initialization of adapter parameters
Data normalization in pre-processing
Experiments
Datasets
Experimental settings
Exploring adapter configurations
Adapter position.
Main results
VTAB.
...and 9 more sections

Figures (3)

Figure 1: Parameter-accuracy characteristics of adaptation methods on the VTAB Zhai:2020:LSRtest sets. We report original results and re-evaluations ($\circlearrowright$) after a complete training schedule with suitable data normalization. Our Adapter+ has clearly the best parameter-accuracy trade-off. The vertical, dashed line shows the possible minimal number of tunable parameters when only the classifiers are trained, i.e., using linear probing (61% accuracy).
Figure 2: Average accuracy for VTAB subgroups on the test sets. For methods marked with $\circlearrowright$, we report results of our re-evaluation after a complete training schedule with suitable data normalization to ensure a fair comparison. Adapter+ is evaluated with rank $r\!\in\![1..32]$.
Figure 3: Illustrations of (a) the inner structure of an adapter with feed-forward layers (FF), activation layer (Act), and optional layer normalization (LN) and scaling, (b)--(d) different possible adapter positions to connect the adapter to the FFN section of the transformer layer. Modules with trainable parameters are shown in red and frozen modules in blue.

Adapters Strike Back

TL;DR

Abstract

Adapters Strike Back

Authors

TL;DR

Abstract

Table of Contents

Figures (3)