Table of Contents
Fetching ...

Efficient ConvBN Blocks for Transfer Learning and Beyond

Kaichao You, Guo Qin, Anchang Bao, Meng Cao, Ping Huang, Jiulong Shan, Mingsheng Long

TL;DR

The paper addresses the ConvBN block bottleneck in transfer learning by analyzing the stability gap between Deploy and Eval modes and introducing Tune mode, which achieves Eval-level backward/forward behavior while approaching Deploy efficiency. Leveraging the associative law between convolution and affine transforms, Tune mode computes transformed weights on the fly, reducing memory footprint from O(X+Y) to O(X+ω′) and cutting compute time compared with Eval. The authors provide theoretical memory/time analyses and validate Tune mode across 5 datasets and 12 models, including object classification, object detection, and adversarial example generation, reporting up to ~44% memory savings and up to ~9% speedups without accuracy loss. The method is integrated into PyTorch and MMCV/MMEngine, enabling practitioners to activate efficient ConvBN blocks with a one-line change, accelerating training and inference in transfer learning and beyond.

Abstract

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To solve the dilemma, we theoretically reveal the reason behind the diminished training stability observed in the Deploy mode. Subsequently, we propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune mode is as stable as Eval mode for transfer learning, and its computational efficiency closely matches that of the Deploy mode. Through extensive experiments in object detection, classification, and adversarial example generation across $5$ datasets and $12$ model architectures, we demonstrate that the proposed Tune mode retains the performance while significantly reducing GPU memory footprint and training time, thereby contributing efficient ConvBN blocks for transfer learning and beyond. Our method has been integrated into both PyTorch (general machine learning framework) and MMCV/MMEngine (computer vision framework). Practitioners just need one line of code to enjoy our efficient ConvBN blocks thanks to PyTorch's builtin machine learning compilers.

Efficient ConvBN Blocks for Transfer Learning and Beyond

TL;DR

The paper addresses the ConvBN block bottleneck in transfer learning by analyzing the stability gap between Deploy and Eval modes and introducing Tune mode, which achieves Eval-level backward/forward behavior while approaching Deploy efficiency. Leveraging the associative law between convolution and affine transforms, Tune mode computes transformed weights on the fly, reducing memory footprint from O(X+Y) to O(X+ω′) and cutting compute time compared with Eval. The authors provide theoretical memory/time analyses and validate Tune mode across 5 datasets and 12 models, including object classification, object detection, and adversarial example generation, reporting up to ~44% memory savings and up to ~9% speedups without accuracy loss. The method is integrated into PyTorch and MMCV/MMEngine, enabling practitioners to activate efficient ConvBN blocks with a one-line change, accelerating training and inference in transfer learning and beyond.

Abstract

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To solve the dilemma, we theoretically reveal the reason behind the diminished training stability observed in the Deploy mode. Subsequently, we propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune mode is as stable as Eval mode for transfer learning, and its computational efficiency closely matches that of the Deploy mode. Through extensive experiments in object detection, classification, and adversarial example generation across datasets and model architectures, we demonstrate that the proposed Tune mode retains the performance while significantly reducing GPU memory footprint and training time, thereby contributing efficient ConvBN blocks for transfer learning and beyond. Our method has been integrated into both PyTorch (general machine learning framework) and MMCV/MMEngine (computer vision framework). Practitioners just need one line of code to enjoy our efficient ConvBN blocks thanks to PyTorch's builtin machine learning compilers.
Paper Structure (43 sections, 6 figures, 12 tables)

This paper contains 43 sections, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Usage of normalization layers in all the 634 object detectors with pre-trained backbones in the MMDetection framework chen_mmdetection:_2019. GN denotes GroupNorm, SyncBN represents synchronized BatchNorm across multiple GPUs and Eval indicates training ConvBN blocks in Eval mode. A majority of detectors (over 78%) are trained with ConvBN blocks in Eval mode.
  • Figure 2: (a): Distribution of scaling coefficients for weight $\left(\gamma/\sqrt{\hat{\sigma}^2 + \epsilon}\right)$ in different backbones. (b): Comparison between training with Eval mode and Deploy mode in both object detection and classification. Severe performance degradation is observed for training with Deploy mode.
  • Figure 3: Memory footprint and running time comparison for Eval mode and Tune mode. The base setting is batchsize $=32$ and input dimension $=224 \times 224$, and we vary batchsize and input dimension to test the efficiency.
  • Figure 4: Tune mode v.s. Eval mode in adversarial example generation.
  • Figure 5: Training curve of Faster RCNN with ResNet101 and HRNet backbone. Models trained in Train Mode shows noticeable performance deterioration compared to Eval Mode.
  • ...and 1 more figures