SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Vijay Lingam; Atula Tejaswi; Aditya Vavre; Aneesh Shetty; Gautham Krishna Gudur; Joydeep Ghosh; Alex Dimakis; Eunsol Choi; Aleksandar Bojchevski; Sujay Sanghavi

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Vijay Lingam, Atula Tejaswi, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Alex Dimakis, Eunsol Choi, Aleksandar Bojchevski, Sujay Sanghavi

TL;DR

SVFT addresses the performance gap in parameter-efficient fine-tuning by tying weight updates to the singular vectors of the pre-trained matrix. It updates $W_0 = U Σ V^T$ with a sparse $M$ as $ΔW = U M V^T$, keeping $U$ and $V$ fixed while training only the sparse coefficients; four sparsity patterns (Plain, Banded, Random, Top-$k$) control expressivity. Across language and vision benchmarks, SVFT recovers up to 96% of full fine-tuning accuracy while using only $0.006$ to $0.25 ext{%}$ of trainable parameters, outperforming existing PEFT methods that reach at most 85% with larger budgets. The method balances parameter efficiency with performance, and theoretical results show SVFT can induce higher-rank perturbations than prior PEFT techniques for the same parameter budget, with memory considerations discussed for practical deployment.

Abstract

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights $W$ and inject learnable matrices $ΔW$. These $ΔW$ matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on $ΔW$ depends on the specific weight matrix $W$. Specifically, SVFT updates $W$ as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

TL;DR

SVFT addresses the performance gap in parameter-efficient fine-tuning by tying weight updates to the singular vectors of the pre-trained matrix. It updates

with a sparse

, keeping

and

fixed while training only the sparse coefficients; four sparsity patterns (Plain, Banded, Random, Top-

) control expressivity. Across language and vision benchmarks, SVFT recovers up to 96% of full fine-tuning accuracy while using only

of trainable parameters, outperforming existing PEFT methods that reach at most 85% with larger budgets. The method balances parameter efficiency with performance, and theoretical results show SVFT can induce higher-rank perturbations than prior PEFT techniques for the same parameter budget, with memory considerations discussed for practical deployment.

Abstract

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights

and inject learnable matrices

. These

matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on

depends on the specific weight matrix

. Specifically, SVFT updates

as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.

Paper Structure (31 sections, 3 equations, 5 figures, 14 tables)

This paper contains 31 sections, 3 equations, 5 figures, 14 tables.

Introduction
Related Work
Method
SVFT Formulation
Properties of SVFT
Experiments
Base Models
Datasets
Results
Performance on Language Tasks
Natural Language Generation.
Commonsense Reasoning.
Natural Language Understanding.
Performance on Vision Tasks
Contribution of Each Weight Type
...and 16 more sections

Figures (5)

Figure 1: Performance vs total trainable parameters for GSM-8K (left) and Commonsense Reasoning (right) on Gemma-2B. $\textsc{SVFT}_{d=16}^{B/R}$ outperforms $\text{DoRA}_{r=8/16}$ with 75% less trainable parameters.
Figure 2: Schematic comparison of LoRA, VeRA, DoRA, and SVFT (left to right).
Figure 3: An Overview of SVFT. The original weights ${\bm{W}}$ are decomposed into ${\bm{U}}, \mathbf{\Sigma}, {\bm{V}}$. Here, ${\bm{M}}$ contains all the trainable parameters, which can be configured into patterns such as Plain, Random, Banded, and Top-$k$, represented by patterns of trainable (orange) and zero (gray) elements.
Figure 4: Performance variation with $\textsc{SVFT}^{B}_{d}$ based on the adapted weight matrices -- GSM-8K with Gemma-2B. Adapting more target weight types results in greater gains in performance. Interestingly, for a fixed parameter budget, adapting ${\bm{U}}$ and ${\bm{D}}$ weight types gives greater lifts than adapting ${\bm{Q}}$ and ${\bm{V}}$.
Figure 5: Performance versus total trainable parameters for GSM-8K on Gemma-7B (left) and LLaMA-3-8B (right).

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

TL;DR

Abstract

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Authors

TL;DR

Abstract

Table of Contents

Figures (5)