Table of Contents
Fetching ...

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

TL;DR

This paper tackles the growing cost of parameter-efficient fine-tuning when deploying multiple LoRAs by introducing PRoLoRA, an intra-layer sharing mechanism with four components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization. PRoLoRA reparameterizes low-rank updates into chunked, partially shared matrices, increasing effective rank while controlling trainable parameters, and adds near-free rotations to boost expressiveness. It preserves LoRA’s advantages and offers higher parameter efficiency, greater capacity, and broad applicability, as demonstrated by ablations and extensive instruction-tuning experiments. Empirical results on instruction-following benchmarks and larger models (e.g., LLaMA2-13B) show PRoLoRA consistently outperforms LoRA at the same budget and scales to bigger models, reducing storage and memory burdens in multi-LoRA deployments. The work suggests PRoLoRA as a resource-friendly alternative to LoRA with potential for integrating inter-layer sharing in future research.

Abstract

With the rapid scaling of large language models (LLMs), serving numerous low-rank adaptations (LoRAs) concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization strategy. As a superset of LoRA, PRoLoRA retains its advantages, and effectively circumvent the drawbacks of peer parameter-sharing methods with superior model capacity, practical feasibility, and broad applicability. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA in both specific parameter budget and performance target scenarios, and its scalability to larger LLMs. Notably, with one time less trainable parameters, PRoLoRA still outperforms LoRA on multiple instruction tuning datasets. Subsequently, an ablation study is conducted to validate the necessity of individual components and highlight the superiority of PRoLoRA over three potential variants. Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

TL;DR

This paper tackles the growing cost of parameter-efficient fine-tuning when deploying multiple LoRAs by introducing PRoLoRA, an intra-layer sharing mechanism with four components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization. PRoLoRA reparameterizes low-rank updates into chunked, partially shared matrices, increasing effective rank while controlling trainable parameters, and adds near-free rotations to boost expressiveness. It preserves LoRA’s advantages and offers higher parameter efficiency, greater capacity, and broad applicability, as demonstrated by ablations and extensive instruction-tuning experiments. Empirical results on instruction-following benchmarks and larger models (e.g., LLaMA2-13B) show PRoLoRA consistently outperforms LoRA at the same budget and scales to bigger models, reducing storage and memory burdens in multi-LoRA deployments. The work suggests PRoLoRA as a resource-friendly alternative to LoRA with potential for integrating inter-layer sharing in future research.

Abstract

With the rapid scaling of large language models (LLMs), serving numerous low-rank adaptations (LoRAs) concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization strategy. As a superset of LoRA, PRoLoRA retains its advantages, and effectively circumvent the drawbacks of peer parameter-sharing methods with superior model capacity, practical feasibility, and broad applicability. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA in both specific parameter budget and performance target scenarios, and its scalability to larger LLMs. Notably, with one time less trainable parameters, PRoLoRA still outperforms LoRA on multiple instruction tuning datasets. Subsequently, an ablation study is conducted to validate the necessity of individual components and highlight the superiority of PRoLoRA over three potential variants. Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.
Paper Structure (38 sections, 5 equations, 2 figures, 4 tables)

This paper contains 38 sections, 5 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Illustration of the original LoRA, our proposed PRoLoRA, and their intermediate states (i.e., CLoRA and RoLoRA). Here we set the rank $r$, unshared rank $u$, sharing rates $m$ and $n$ of the $\mathbf{A}$ and $\mathbf{B}$ matrices to be 4, 1, 2 and 3, respectively. Different shades of color in matrices $\mathbf{A}$ and $\mathbf{B}$ denote distinct ranks. The rotation arrows and center numbers indicate rotation directions and base strides, while dotted lines and higher transparency denote replicated or rotated weights, emphasizing that these weights do not contribute to the trainable parameters. Additionally, the center numbers of each matrix block represent the relative displacement of the $\mathbf{A}_i$ and $\mathbf{B}_i$ chunks compared to those of top-left block (i.e., $\mathbf{A}_0$ and $\mathbf{B}_0$).
  • Figure 2: Performance of PRoLoRA with the rank of 32 with respect to unshared ranks and learning rates given a specific parameter budget on the LLaMA2-7B model and BBH benchmark. Specially, when the unshared rank is 8, all the ranks are unshared (i.e., vanilla LoRA).