RaSA: Rank-Sharing Low-Rank Adaptation
Zhiwei He, Zhaopeng Tu, Xing Wang, Xingyu Chen, Zhijie Wang, Jiahao Xu, Tian Liang, Wenxiang Jiao, Zhuosheng Zhang, Rui Wang
TL;DR
RaSA tackles the expressivity bottleneck of LoRA by introducing partial rank sharing across layers, forming a shared rank pool with layer-specific weighting. The method increases the effective update rank from $r$ to $r - k + Lk$ while preserving parameter efficiency and enabling easy merging back into the base model. The authors provide theoretical guarantees that RaSA's reconstruction error is bounded by LoRA's and corroborate this with empirical evidence on code generation and mathematical reasoning tasks, showing RaSA can learn more efficiently and forget less. Scaling analyses demonstrate RaSA's robustness across model and data scales, indicating practical gains for challenging downstream tasks in large language models.
Abstract
Low-rank adaptation (LoRA) has been prominently employed for parameter-efficient fine-tuning of large language models (LLMs). However, the limited expressive capacity of LoRA, stemming from the low-rank constraint, has been recognized as a bottleneck, particularly in rigorous tasks like code generation and mathematical reasoning. To address this limitation, we introduce Rank-Sharing Low-Rank Adaptation (RaSA), an innovative extension that enhances the expressive capacity of LoRA by leveraging partial rank sharing across layers. By forming a shared rank pool and applying layer-specific weighting, RaSA effectively increases the number of ranks without augmenting parameter overhead. Our theoretically grounded and empirically validated approach demonstrates that RaSA not only maintains the core advantages of LoRA but also significantly boosts performance in challenging code and math tasks. Code, data and scripts are available at: https://github.com/zwhe99/RaSA.
