Table of Contents
Fetching ...

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Vijay Lingam, Atula Tejaswi, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Alex Dimakis, Eunsol Choi, Aleksandar Bojchevski, Sujay Sanghavi

TL;DR

SVFT addresses the performance gap in parameter-efficient fine-tuning by tying weight updates to the singular vectors of the pre-trained matrix. It updates $W_0 = U Σ V^T$ with a sparse $M$ as $ΔW = U M V^T$, keeping $U$ and $V$ fixed while training only the sparse coefficients; four sparsity patterns (Plain, Banded, Random, Top-$k$) control expressivity. Across language and vision benchmarks, SVFT recovers up to 96% of full fine-tuning accuracy while using only $0.006$ to $0.25 ext{%}$ of trainable parameters, outperforming existing PEFT methods that reach at most 85% with larger budgets. The method balances parameter efficiency with performance, and theoretical results show SVFT can induce higher-rank perturbations than prior PEFT techniques for the same parameter budget, with memory considerations discussed for practical deployment.

Abstract

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\) and inject learnable matrices \(ΔW\). These \(ΔW\) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on \(ΔW\) depends on the specific weight matrix \(W\). Specifically, SVFT updates \(W\) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

TL;DR

SVFT addresses the performance gap in parameter-efficient fine-tuning by tying weight updates to the singular vectors of the pre-trained matrix. It updates with a sparse as , keeping and fixed while training only the sparse coefficients; four sparsity patterns (Plain, Banded, Random, Top-) control expressivity. Across language and vision benchmarks, SVFT recovers up to 96% of full fine-tuning accuracy while using only to of trainable parameters, outperforming existing PEFT methods that reach at most 85% with larger budgets. The method balances parameter efficiency with performance, and theoretical results show SVFT can induce higher-rank perturbations than prior PEFT techniques for the same parameter budget, with memory considerations discussed for practical deployment.

Abstract

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights and inject learnable matrices . These matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on depends on the specific weight matrix . Specifically, SVFT updates as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.
Paper Structure (31 sections, 3 equations, 5 figures, 14 tables)

This paper contains 31 sections, 3 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Performance vs total trainable parameters for GSM-8K (left) and Commonsense Reasoning (right) on Gemma-2B. $\textsc{SVFT}_{d=16}^{B/R}$ outperforms $\text{DoRA}_{r=8/16}$ with 75% less trainable parameters.
  • Figure 2: Schematic comparison of LoRA, VeRA, DoRA, and SVFT (left to right).
  • Figure 3: An Overview of SVFT. The original weights ${\bm{W}}$ are decomposed into ${\bm{U}}, \mathbf{\Sigma}, {\bm{V}}$. Here, ${\bm{M}}$ contains all the trainable parameters, which can be configured into patterns such as Plain, Random, Banded, and Top-$k$, represented by patterns of trainable (orange) and zero (gray) elements.
  • Figure 4: Performance variation with $\textsc{SVFT}^{B}_{d}$ based on the adapted weight matrices -- GSM-8K with Gemma-2B. Adapting more target weight types results in greater gains in performance. Interestingly, for a fixed parameter budget, adapting ${\bm{U}}$ and ${\bm{D}}$ weight types gives greater lifts than adapting ${\bm{Q}}$ and ${\bm{V}}$.
  • Figure 5: Performance versus total trainable parameters for GSM-8K on Gemma-7B (left) and LLaMA-3-8B (right).