Table of Contents
Fetching ...

SARA: Singular-Value Based Adaptive Low-Rank Adaption

Jihao Gu, Shuai Chen, Zelin Wang, Yibo Zhang, Ping Gong

TL;DR

This work targets parameter-efficient fine-tuning for large pre-trained models by uncovering layer-specific intrinsic ranks via singular-value decomposition and using this insight to drive adaptive low-rank adaptations. The authors introduce SARA, which computes a per-layer rank k from the pre-trained weights and adds a truncated singular-value matrix in parallel to the base weights, thereby enhancing LoRA without runtime overhead. They further propose Mo-SARA, a Mixture-of-SARA approach that trains multiple parallel singular-value sets with a lightweight router to massively reduce trainable parameters while preserving performance. Across 15 datasets spanning math reasoning, commonsense inference, and end-to-end tasks, SARA and Mo-SARA demonstrate superior or competitive accuracy with substantially fewer trainable parameters, addressing inter-layer importance and achieving efficient, adaptive fine-tuning.

Abstract

With the increasing number of parameters in large pre-trained models, LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead. The LoRA method assumes that weight changes during fine-tuning can be approximated by low-rank matrices. However, the rank values need to be manually verified to match different downstream tasks, and they cannot accommodate the varying importance of different layers in the model. In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD. Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA), which adaptively finds the rank during initialization by performing SVD on the pre-trained weights. Additionally, we explore the Mixture-of-SARA(Mo-SARA), which significantly reduces the number of parameters by fine-tuning only multiple parallel sets of singular values controlled by a router. Extensive experiments on various complex tasks demonstrate the simplicity and parameter efficiency of our methods. They can effectively and adaptively find the most suitable rank for each layer of each model.

SARA: Singular-Value Based Adaptive Low-Rank Adaption

TL;DR

This work targets parameter-efficient fine-tuning for large pre-trained models by uncovering layer-specific intrinsic ranks via singular-value decomposition and using this insight to drive adaptive low-rank adaptations. The authors introduce SARA, which computes a per-layer rank k from the pre-trained weights and adds a truncated singular-value matrix in parallel to the base weights, thereby enhancing LoRA without runtime overhead. They further propose Mo-SARA, a Mixture-of-SARA approach that trains multiple parallel singular-value sets with a lightweight router to massively reduce trainable parameters while preserving performance. Across 15 datasets spanning math reasoning, commonsense inference, and end-to-end tasks, SARA and Mo-SARA demonstrate superior or competitive accuracy with substantially fewer trainable parameters, addressing inter-layer importance and achieving efficient, adaptive fine-tuning.

Abstract

With the increasing number of parameters in large pre-trained models, LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead. The LoRA method assumes that weight changes during fine-tuning can be approximated by low-rank matrices. However, the rank values need to be manually verified to match different downstream tasks, and they cannot accommodate the varying importance of different layers in the model. In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD. Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA), which adaptively finds the rank during initialization by performing SVD on the pre-trained weights. Additionally, we explore the Mixture-of-SARA(Mo-SARA), which significantly reduces the number of parameters by fine-tuning only multiple parallel sets of singular values controlled by a router. Extensive experiments on various complex tasks demonstrate the simplicity and parameter efficiency of our methods. They can effectively and adaptively find the most suitable rank for each layer of each model.
Paper Structure (27 sections, 6 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 27 sections, 6 equations, 9 figures, 12 tables, 1 algorithm.

Figures (9)

  • Figure 1: An overview of our methods, (a) performing SVD on the pre-trained weights and determining the number $k$ of values that account for a proportion threshold $m$ of the total sum of singular values; (b) the method of adding a truncated singular value matrix to the pre-trained weights based on $k$; and (c) the extreme method of fine-tuning only mixture of parallel singular values. $\Lambda$ and $v$, as diagonal matrix, only require a one-dimensional vector for storage.
  • Figure 2: The impact of different layers on the average accuracy of mathematical reasoning tasks and the $k$ (mean value obtained from Q and V matrix SVD.)
  • Figure 3: Average accuracy of SARA and LoRA methods across layers in mathematical reasoning tasks.
  • Figure 4: Average accuracy of the SARA and LoRA methods on mathematical reasoning tasks with different trainable parameters. The thresholds for determining k in the SARA method [0.006, 0.01, 0.016, 0.02] and the r values used to adjust the parameter count in the LoRA method [5, 10, 15, 20] are indicated in the figure.
  • Figure 5: Average accuracy of Mo-SARA (1 head) on mathematical reasoning tasks under different thresholds, the bar chart displays the trainable parameters above.
  • ...and 4 more figures