Table of Contents
Fetching ...

Towards Efficient Visual Adaption via Structural Re-parameterization

Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji

TL;DR

<3-5 sentence high-level summary> RepAdapter tackles the challenge of adapting giant vision models efficiently by introducing a sequential, re-parameterizable adapter that can be merged into pre-trained weights, yielding zero additional inference cost. It combines a dense-sparse, group-wise adapter design with careful placement to maximize performance gains while reducing parameter count. Across 27 benchmarks spanning image, video, and semantic segmentation tasks, RepAdapter outperforms state-of-the-art PETL methods and demonstrates strong generalization across architectures. The work also provides compelling efficiency analyses, showing reduced training time and memory, enabling practical deployment of large vision models.

Abstract

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various pre-trained models by updating a small number of parameters instead of full tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. Specifically, we first prove that common adaptation modules can also be seamlessly integrated into most giant vision models via our structural re-parameterization, thereby achieving zero-cost during inference. We then investigate the sparse design and effective placement of adapter structure, helping our RepAdaper obtain other advantages in terms of parameter efficiency and performance. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k. The generalization ability of RepAdapter is also well validated by a bunch of vision models. Our source code is released at https://github.com/luogen1996/RepAdapter.

Towards Efficient Visual Adaption via Structural Re-parameterization

TL;DR

<3-5 sentence high-level summary> RepAdapter tackles the challenge of adapting giant vision models efficiently by introducing a sequential, re-parameterizable adapter that can be merged into pre-trained weights, yielding zero additional inference cost. It combines a dense-sparse, group-wise adapter design with careful placement to maximize performance gains while reducing parameter count. Across 27 benchmarks spanning image, video, and semantic segmentation tasks, RepAdapter outperforms state-of-the-art PETL methods and demonstrates strong generalization across architectures. The work also provides compelling efficiency analyses, showing reduced training time and memory, enabling practical deployment of large vision models.

Abstract

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various pre-trained models by updating a small number of parameters instead of full tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. Specifically, we first prove that common adaptation modules can also be seamlessly integrated into most giant vision models via our structural re-parameterization, thereby achieving zero-cost during inference. We then investigate the sparse design and effective placement of adapter structure, helping our RepAdaper obtain other advantages in terms of parameter efficiency and performance. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k. The generalization ability of RepAdapter is also well validated by a bunch of vision models. Our source code is released at https://github.com/luogen1996/RepAdapter.
Paper Structure (19 sections, 12 equations, 5 figures, 8 tables)

This paper contains 19 sections, 12 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Performance comparison of our RepAdpater and existing PETL methods vptadapteradaptformerloranoah on VTAB-1K. The vision model is ViT-B/16 and the inference speed is measured on a NVIDIA 3090 GPU with a batch size of 1. Most existing PETL methods incur non-negligible GPU latency during inference, while our RepAdapter does not.
  • Figure 2: Comparison of existing PETL methods adaptformervptlora and our RepAdapter. RepAdapter is deployed in a sequential manner, but it can be completely re-parameterized into the vision models during inference, enabling zero additional computational overhead. Its structure is also more lightweight than existing PETL methods.
  • Figure 3: Illustration the structural re-parameterization of RepAdapter. (a) RepAdapter can be simplified to a linear projection after training. (b) The simplified weights can be merged into MHA and FFN.
  • Figure 4: The deployments of RepAdapter in ViT. Four possible locations that RepAdapter can be inserted and re-parameterized. Our final deployments are in dark orange.
  • Figure 5: Comparisons of training time and memory overhead on a NVIDIA A100 GPU.