Table of Contents
Fetching ...

RaSA: Rank-Sharing Low-Rank Adaptation

Zhiwei He, Zhaopeng Tu, Xing Wang, Xingyu Chen, Zhijie Wang, Jiahao Xu, Tian Liang, Wenxiang Jiao, Zhuosheng Zhang, Rui Wang

TL;DR

RaSA tackles the expressivity bottleneck of LoRA by introducing partial rank sharing across layers, forming a shared rank pool with layer-specific weighting. The method increases the effective update rank from $r$ to $r - k + Lk$ while preserving parameter efficiency and enabling easy merging back into the base model. The authors provide theoretical guarantees that RaSA's reconstruction error is bounded by LoRA's and corroborate this with empirical evidence on code generation and mathematical reasoning tasks, showing RaSA can learn more efficiently and forget less. Scaling analyses demonstrate RaSA's robustness across model and data scales, indicating practical gains for challenging downstream tasks in large language models.

Abstract

Low-rank adaptation (LoRA) has been prominently employed for parameter-efficient fine-tuning of large language models (LLMs). However, the limited expressive capacity of LoRA, stemming from the low-rank constraint, has been recognized as a bottleneck, particularly in rigorous tasks like code generation and mathematical reasoning. To address this limitation, we introduce Rank-Sharing Low-Rank Adaptation (RaSA), an innovative extension that enhances the expressive capacity of LoRA by leveraging partial rank sharing across layers. By forming a shared rank pool and applying layer-specific weighting, RaSA effectively increases the number of ranks without augmenting parameter overhead. Our theoretically grounded and empirically validated approach demonstrates that RaSA not only maintains the core advantages of LoRA but also significantly boosts performance in challenging code and math tasks. Code, data and scripts are available at: https://github.com/zwhe99/RaSA.

RaSA: Rank-Sharing Low-Rank Adaptation

TL;DR

RaSA tackles the expressivity bottleneck of LoRA by introducing partial rank sharing across layers, forming a shared rank pool with layer-specific weighting. The method increases the effective update rank from to while preserving parameter efficiency and enabling easy merging back into the base model. The authors provide theoretical guarantees that RaSA's reconstruction error is bounded by LoRA's and corroborate this with empirical evidence on code generation and mathematical reasoning tasks, showing RaSA can learn more efficiently and forget less. Scaling analyses demonstrate RaSA's robustness across model and data scales, indicating practical gains for challenging downstream tasks in large language models.

Abstract

Low-rank adaptation (LoRA) has been prominently employed for parameter-efficient fine-tuning of large language models (LLMs). However, the limited expressive capacity of LoRA, stemming from the low-rank constraint, has been recognized as a bottleneck, particularly in rigorous tasks like code generation and mathematical reasoning. To address this limitation, we introduce Rank-Sharing Low-Rank Adaptation (RaSA), an innovative extension that enhances the expressive capacity of LoRA by leveraging partial rank sharing across layers. By forming a shared rank pool and applying layer-specific weighting, RaSA effectively increases the number of ranks without augmenting parameter overhead. Our theoretically grounded and empirically validated approach demonstrates that RaSA not only maintains the core advantages of LoRA but also significantly boosts performance in challenging code and math tasks. Code, data and scripts are available at: https://github.com/zwhe99/RaSA.

Paper Structure

This paper contains 36 sections, 1 theorem, 19 equations, 9 figures, 2 tables.

Key Result

Theorem 3.1

$e_{\mathrm{rasa}(k)} \leq e_{\mathrm{lora}}$

Figures (9)

  • Figure 1: Decomposition of the update matrix $\Delta{\bm{W}}_i$ in LoRA and RaSA, where $i$ is the layer index.
  • Figure 2: Reconstruction error curves of RaSA ($r=8, k=1$) during coordinate descent. We also plot the minimum reconstruction error of LoRA (\ref{['eq:lora-mre-svd']}) for comparison.
  • Figure 3: Reconstruction error comparison between RaSA and LoRA as a function of the shared rank parameter $k$. We also plot the minimum reconstruction error of LoRA (\ref{['eq:lora-mre-svd']}) for comparison. The results are average across all linear modules in the model.
  • Figure 4: RaSA learns more and faster than LoRA. Training curves of LoRA and RaSA with different ranks. RaSA consistently outperforms LoRA with the same rank across models and tasks.
  • Figure 5: RaSA forgets less than LoRA. Y-axis shows the average of prediction accuracy on three benchmarks to evaluate model's forgetting. Higher prediction accuracy denotes less forgetting.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof