Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang
TL;DR
CoDA introduces a parameter-efficient transfer learning approach that adds lightweight adapters and a learned router to enable conditional computation in pretrained models. By sparsely activating heavy computation on a small subset of tokens per layer, CoDA achieves substantial inference speedups (2x–8x) with minimal accuracy loss, while preserving the full parameter budget of the original model. The method is demonstrated across NLP, vision, and speech with ablations showing the importance of learned routing and the ability to pretrain cheaply from dense baselines. CoDA is compatible with other PETL techniques like LoRA, offering a practical path to scalable deployment of large pretrained models.
Abstract
We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.
