Table of Contents
Fetching ...

SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Chengwei Sun, Jiwei Wei, Yujia Wu, Yiming Shi, Shiyuan He, Zeyu Ma, Ning Xie, Yang Yang

TL;DR

SVFit tackles the memory and efficiency barriers of fine-tuning large pre-trained models by leveraging singular value decomposition to initialize low-rank adapters. It decomposes each weight matrix as $W = W_r + W_e$, trains only the top-$r$ singular values in $W_r$ while freezing $W_e$ and the associated subspaces, enabling rapid domain adaptation with a drastically reduced parameter budget. Empirically, SVFit outperforms LoRA and PiSSA across natural language understanding, image classification, and DreamBooth tasks, achieving comparable or better performance with roughly 16× fewer trainable parameters. This approach offers practical gains for resource-constrained deployment and broad applicability to diverse downstream tasks, with potential for dynamic budget allocation and extension to more complex domains.

Abstract

Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue by adjusting only a small subset of parameters. Nevertheless, these methods typically employ random initialization for low-rank matrices, which can lead to inefficiencies in gradient descent and diminished generalizability due to suboptimal starting points. To address these limitations, we propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to initialize low-rank matrices using critical singular values as trainable parameters. Specifically, SVFit performs SVD on the pre-trained weight matrix to obtain the best rank-r approximation matrix, emphasizing the most critical singular values that capture over 99% of the matrix's information. These top-r singular values are then used as trainable parameters to scale the fundamental subspaces of the matrix, facilitating rapid domain adaptation. Extensive experiments across various pre-trained models in natural language understanding, text-to-image generation, and image classification tasks reveal that SVFit outperforms LoRA while requiring 16 times fewer trainable parameters.

SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

TL;DR

SVFit tackles the memory and efficiency barriers of fine-tuning large pre-trained models by leveraging singular value decomposition to initialize low-rank adapters. It decomposes each weight matrix as , trains only the top- singular values in while freezing and the associated subspaces, enabling rapid domain adaptation with a drastically reduced parameter budget. Empirically, SVFit outperforms LoRA and PiSSA across natural language understanding, image classification, and DreamBooth tasks, achieving comparable or better performance with roughly 16× fewer trainable parameters. This approach offers practical gains for resource-constrained deployment and broad applicability to diverse downstream tasks, with potential for dynamic budget allocation and extension to more complex domains.

Abstract

Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue by adjusting only a small subset of parameters. Nevertheless, these methods typically employ random initialization for low-rank matrices, which can lead to inefficiencies in gradient descent and diminished generalizability due to suboptimal starting points. To address these limitations, we propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to initialize low-rank matrices using critical singular values as trainable parameters. Specifically, SVFit performs SVD on the pre-trained weight matrix to obtain the best rank-r approximation matrix, emphasizing the most critical singular values that capture over 99% of the matrix's information. These top-r singular values are then used as trainable parameters to scale the fundamental subspaces of the matrix, facilitating rapid domain adaptation. Extensive experiments across various pre-trained models in natural language understanding, text-to-image generation, and image classification tasks reveal that SVFit outperforms LoRA while requiring 16 times fewer trainable parameters.
Paper Structure (16 sections, 11 equations, 6 figures, 4 tables)

This paper contains 16 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: SVD-based reconstruction results of the Fishstar image ($256 \times 256$) DATA. The first row shows the reconstruction using the top 8, 16, 32, 64, 128, and 256 largest singular values, sorted in descending order ($r = 8$, $r = 16$, $r = 32$, $r = 64$, $r = 128$, $r = 256$). The second row displays the reconstruction using the smallest 8, 16, 32, 64, 128, and 256 singular values, sorted in ascending order. This comparison highlights the pivotal role of dominant singular values in maintaining image quality, while the smallest singular values have minimal impact on the overall structure.
  • Figure 2: Visual comparison among LoRA, PiSSA, and SVFit. (a) LoRA introduces two low-rank matrices $A$ and $B$ to approximate weight updates during fine-tuning. (b) PiSSA initializes $A$ and $B$ with the principal components of the pre-trained weight $W$, freezing the residual matrix during fine-tuning. (c) SVFit initializes low-rank matrices through SVD of $W$ and trains only the most significant top-$r$ singular values (for simplicity, $d_{1} \ll d_{2}$ is assumed).
  • Figure 3: Illustration of the SVD of matrix $W$ and its fundamental subspaces: This figure illustrates the SVD of the pre-trained weight matrix $W \in \mathbb{R}^{d_{1} \times d_{2}}$, where $W$ is decomposed into singular values and vectors as $W = U \text{diag}(\Sigma) V^{T}$. The decomposition yields a rank-$r$ approximation matrix $W_{r}$ and a residual matrix $W_{e}$. Specifically, the range space of $W$ is spanned by $U_{r}$, and its null space is spanned by $V_{e}$. Conversely, the range space of $W^{T}$ is spanned by $V_{r}$, and its null space is spanned by $U_{e}$.
  • Figure 4: Randomly selected samples from DreamBooth, LoRA, and SVFit for the subject-driven generation task.
  • Figure 5: Performance of SVFit fine-tuning for ViT-base model on image classification tasks across different parameter budget levels. The $x$-axis represents the rank, and the $y$-axis is the evaluation index of different datasets.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Definition 1: Range space
  • Definition 2: Null space
  • proof