Table of Contents
Fetching ...

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation

Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Xuan Shen, Pu Zhao, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Dong Huang, Yanzhi Wang

TL;DR

LoRA's benefit diminishes as rank grows due to rank-dependent gradient dynamics. RoRA introduces Optimization Scaling (OpS) with $γ=α/√r$, providing theoretical and empirical support for rank-invariant gradient updates during fine-tuning. The approach yields consistent gains across multiple LLaMA variants and a heavily pruned SHEARED-LLAMA configuration, outperforming LoRA and DoRA in average accuracy (e.g., +6.5% on LLaMA-7B) and robustness. This work enables scalable, reliable PEFT for large language models, including challenging pruned settings, with practical impact on deployment efficiency and task performance.

Abstract

Fine-tuning helps large language models (LLM) recover degraded information and enhance task performance. Although Low-Rank Adaptation (LoRA) is widely used and effective for fine-tuning, we have observed that its scaling factor can limit or even reduce performance as the rank size increases. To address this issue, we propose RoRA (Rank-adaptive Reliability Optimization), a simple yet effective method for optimizing LoRA's scaling factor. By replacing $α/r$ with $α/\sqrt{r}$, RoRA ensures improved performance as rank size increases. Moreover, RoRA enhances low-rank adaptation in fine-tuning uncompressed models and excels in the more challenging task of accuracy recovery when fine-tuning pruned models. Extensive experiments demonstrate the effectiveness of RoRA in fine-tuning both uncompressed and pruned models. RoRA surpasses the state-of-the-art (SOTA) in average accuracy and robustness on LLaMA-7B/13B, LLaMA2-7B, and LLaMA3-8B, specifically outperforming LoRA and DoRA by 6.5% and 2.9% on LLaMA-7B, respectively. In pruned model fine-tuning, RoRA shows significant advantages; for SHEARED-LLAMA-1.3, a LLaMA-7B with 81.4% pruning, RoRA achieves 5.7% higher average accuracy than LoRA and 3.9% higher than DoRA.

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation

TL;DR

LoRA's benefit diminishes as rank grows due to rank-dependent gradient dynamics. RoRA introduces Optimization Scaling (OpS) with , providing theoretical and empirical support for rank-invariant gradient updates during fine-tuning. The approach yields consistent gains across multiple LLaMA variants and a heavily pruned SHEARED-LLAMA configuration, outperforming LoRA and DoRA in average accuracy (e.g., +6.5% on LLaMA-7B) and robustness. This work enables scalable, reliable PEFT for large language models, including challenging pruned settings, with practical impact on deployment efficiency and task performance.

Abstract

Fine-tuning helps large language models (LLM) recover degraded information and enhance task performance. Although Low-Rank Adaptation (LoRA) is widely used and effective for fine-tuning, we have observed that its scaling factor can limit or even reduce performance as the rank size increases. To address this issue, we propose RoRA (Rank-adaptive Reliability Optimization), a simple yet effective method for optimizing LoRA's scaling factor. By replacing with , RoRA ensures improved performance as rank size increases. Moreover, RoRA enhances low-rank adaptation in fine-tuning uncompressed models and excels in the more challenging task of accuracy recovery when fine-tuning pruned models. Extensive experiments demonstrate the effectiveness of RoRA in fine-tuning both uncompressed and pruned models. RoRA surpasses the state-of-the-art (SOTA) in average accuracy and robustness on LLaMA-7B/13B, LLaMA2-7B, and LLaMA3-8B, specifically outperforming LoRA and DoRA by 6.5% and 2.9% on LLaMA-7B, respectively. In pruned model fine-tuning, RoRA shows significant advantages; for SHEARED-LLAMA-1.3, a LLaMA-7B with 81.4% pruning, RoRA achieves 5.7% higher average accuracy than LoRA and 3.9% higher than DoRA.
Paper Structure (8 sections, 15 equations, 3 figures, 2 tables)

This paper contains 8 sections, 15 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Average accuracy of LoRA, DoRA, and ours RoRA for varying ranks for LLaMA-7B on the commonsense reasoning tasks.
  • Figure 2: Difference between LoRA and RoRA.
  • Figure 3: Comparison of the loss curves of LoRA, DoRA, and RoRA fine-tuning LLaMA 7B with $rank$$r$ of 128.