Table of Contents
Fetching ...

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

TL;DR

This work proposes a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy, which enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design.

Abstract

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at \href{https://github.com/zstarN70/RLRR.git}{https://github.com/zstarN70/RLRR.git}.

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

TL;DR

This work proposes a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy, which enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design.

Abstract

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at \href{https://github.com/zstarN70/RLRR.git}{https://github.com/zstarN70/RLRR.git}.
Paper Structure (22 sections, 12 equations, 3 figures, 18 tables)

This paper contains 22 sections, 12 equations, 3 figures, 18 tables.

Figures (3)

  • Figure 1: Illustration of the proposed RLRR method. For any weight matrix $\mathbf{W}^{(l)}$ in the MHA and FFN modules, we fine-tune the frozen pre-training parameter matrix using a residual structure. This involves combining the frozen matrix with a low-rank-based scaling and shifting operation i.e., $\bigtriangleup\mathbf{W}^{(l)}$. From the perspective of SVD, scaling vectors $\vec{\boldsymbol{s}}_{\rm left}^{(l)}$ and $\vec{\boldsymbol{s}}_{\rm right}^{(l)}$ and shifting vector $\vec{\boldsymbol{f}}_{(l)}$ can also be interpreted as adjustments to the rows and columns of the pre-training matrix $\mathbf{W}^{(l)}$.
  • Figure 2: Ablation study using the VIT-B/16 backbone on the CIFAR-100 dataset to evaluate the impact of incorporating RLRR adaptation across different module and layer combinations.
  • Figure 3: Illustration of the RLRR method's extension to CNN.