Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets
Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Xiang Li, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides
TL;DR
Conv-Adapter delivers a lightweight, parameter-efficient transfer-learning module for ConvNets that freezes the backbone and learns small, task-specific feature modulations. By employing a bottleneck of depth-wise separable convolutions and four adaption schemes, it maintains spatial locality and receptive-field alignment, achieving comparable or superior performance to full fine-tuning on 23 classification tasks with around 3.5% of trainable backbone parameters. It also demonstrates strong few-shot improvements and effective transfer to object detection and semantic segmentation, then analyzes the behavior with CKA and MMD to explain when and why it works. The approach broadens the applicability of parameter-efficient transfer learning in CV, offering practical benefits for deployment and multi-domain adaptation, while highlighting areas for future work in domain robustness and architecture-aware design.
Abstract
While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness with large-scale ConvNets is still under-studied on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbones while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50. It can also be applied for transformer-based backbones. Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on the few-shot classification with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.
