Table of Contents
Fetching ...

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Xiang Li, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides

TL;DR

Conv-Adapter delivers a lightweight, parameter-efficient transfer-learning module for ConvNets that freezes the backbone and learns small, task-specific feature modulations. By employing a bottleneck of depth-wise separable convolutions and four adaption schemes, it maintains spatial locality and receptive-field alignment, achieving comparable or superior performance to full fine-tuning on 23 classification tasks with around 3.5% of trainable backbone parameters. It also demonstrates strong few-shot improvements and effective transfer to object detection and semantic segmentation, then analyzes the behavior with CKA and MMD to explain when and why it works. The approach broadens the applicability of parameter-efficient transfer learning in CV, offering practical benefits for deployment and multi-domain adaptation, while highlighting areas for future work in domain robustness and architecture-aware design.

Abstract

While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness with large-scale ConvNets is still under-studied on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbones while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50. It can also be applied for transformer-based backbones. Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on the few-shot classification with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

TL;DR

Conv-Adapter delivers a lightweight, parameter-efficient transfer-learning module for ConvNets that freezes the backbone and learns small, task-specific feature modulations. By employing a bottleneck of depth-wise separable convolutions and four adaption schemes, it maintains spatial locality and receptive-field alignment, achieving comparable or superior performance to full fine-tuning on 23 classification tasks with around 3.5% of trainable backbone parameters. It also demonstrates strong few-shot improvements and effective transfer to object detection and semantic segmentation, then analyzes the behavior with CKA and MMD to explain when and why it works. The approach broadens the applicability of parameter-efficient transfer learning in CV, offering practical benefits for deployment and multi-domain adaptation, while highlighting areas for future work in domain robustness and architecture-aware design.

Abstract

While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness with large-scale ConvNets is still under-studied on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbones while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50. It can also be applied for transformer-based backbones. Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on the few-shot classification with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.
Paper Structure (46 sections, 4 equations, 7 figures, 14 tables)

This paper contains 46 sections, 4 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: Performance of Conv-Adapter compared to other transfer learning methods on ResNet-50 BiT-M. We compute the relative performance gain w.r.t to fine-tuning and percentage of trainable parameters of the backbone (w/o linear head) on 23 image classification datasets from various domains to compute the results, with mean and standard deviation highlighted. Conv-Adapter achieves a superior trade-off between transfer accuracy and parameter efficiency.
  • Figure 2: Architecture of Conv-Adapter, which has a bottleneck composed of depth-wise separable convolutions with non-linearity activation. $C_{in}$, $C_{out}$, $H$, $W$ is set to keep the same as in backbone. $\boldsymbol{\alpha}$ and $\gamma$ are hyper-parameters to tune.
  • Figure 3: Four adapting schemes of Conv-Adapter to ResNet50: Convolution Parallel, Convolutional Sequential, Residual Parallel, and Residual Sequential. The schemes differ regarding the position of of the modified representation and corresponding insertion form. Other networks can be adapted similarly following the illustration. Green modules are frozen during fine-tuning.
  • Figure 4: Sensitivity to hyper-parameters of initialization of learnable scaling vector $\boldsymbol{\alpha}$ and compression factor $\gamma$.
  • Figure 5: Sensitivity to kernel size of depth-wise convolution in Conv-Adapter, for both ResNet50 and ConvNext-B.
  • ...and 2 more figures