CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer Learning
Feng Chen
TL;DR
Calibration Side Tuning (CST) targets parameter- and memory-efficient transfer learning for object detection by adapting transformer-inspired fine-tuning techniques to ResNet backbones. CST employs a dedicated side network with Maximal Transition Calibration (MTC) gating to fuse backbone and side features, achieving strong accuracy while keeping added parameters and memory footprint small. Through extensive ablations and comparisons on five datasets, CST outperforms existing PETL methods and offers a favorable trade-off between model complexity and performance. This work demonstrates practical benefits for deploying fine-tuned detectors under resource constraints in diverse settings.
Abstract
Achieving a universally high accuracy in object detection is quite challenging, and the mainstream focus in the industry currently lies on detecting specific classes of objects. However, deploying one or multiple object detection networks requires a certain amount of GPU memory for training and storage capacity for inference. This presents challenges in terms of how to effectively coordinate multiple object detection tasks under resource-constrained conditions. This paper introduces a lightweight fine-tuning strategy called Calibration side tuning, which integrates aspects of adapter tuning and side tuning to adapt the successful techniques employed in transformers for use with ResNet. The Calibration side tuning architecture that incorporates maximal transition calibration, utilizing a small number of additional parameters to enhance network performance while maintaining a smooth training process. Furthermore, this paper has conducted an analysis on multiple fine-tuning strategies and have implemented their application within ResNet, thereby expanding the research on fine-tuning strategies for object detection networks. Besides, this paper carried out extensive experiments using five benchmark datasets. The experimental results demonstrated that this method outperforms other compared state-of-the-art techniques, and a better balance between the complexity and performance of the finetune schemes is achieved.
