Table of Contents
Fetching ...

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park

TL;DR

LoRA’s gradient bottleneck in PEFT limits performance when the effective rank grows, due to gradient entanglement across input channels. GraLoRA mitigates this by partitioning weight updates into $k\times k$ blocks with independent adapters, raising the effective rank to $kr$ and localizing gradient propagation. Empirically, GraLoRA consistently surpasses LoRA and baselines across code generation, reasoning, GLUE, and image generation, achieving up to +8.5% absolute gain on HumanEval+ Pass@1 and showing robust improvements across model sizes and ranks. This granular, low-cost extension provides a scalable path toward closer-to-FFT performance in PEFT for diverse domains.

Abstract

Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

TL;DR

LoRA’s gradient bottleneck in PEFT limits performance when the effective rank grows, due to gradient entanglement across input channels. GraLoRA mitigates this by partitioning weight updates into blocks with independent adapters, raising the effective rank to and localizing gradient propagation. Empirically, GraLoRA consistently surpasses LoRA and baselines across code generation, reasoning, GLUE, and image generation, achieving up to +8.5% absolute gain on HumanEval+ Pass@1 and showing robust improvements across model sizes and ranks. This granular, low-cost extension provides a scalable path toward closer-to-FFT performance in PEFT for diverse domains.

Abstract

Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git

Paper Structure

This paper contains 28 sections, 10 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Illustration of LoRA architecture and GraLoRA architecture. GraLoRA consists of $k^2$ small adapter pairs, where each input and output dimension is $k$ times smaller than the original LoRA.
  • Figure 2: Gradient dynamics of FFT and LoRA in the presence of an outlier input channel. The red channel in input $X$ denotes the outlier. While FFT localizes the gradient impact, LoRA's entire gradient update becomes disproportionately influenced by the single outlier.
  • Figure 3: (a) Mean input channel values for the down-projection matrices across layers in LLaMA3.1–8B. A pronounced outlier exists in Layer 1, channel 198 and 2427. (b) Gradient deviation between LoRA and FFT increases with rank, showing LoRA’s susceptibility to input outliers. (c) GraLoRA gradient results at rank 128. GraLoRA noticeably reduces gradient deviation between FFT.
  • Figure 4: Gradient distribution in Layer 1 down-projection matrix. LoRA gradients show poor alignment with FFT, outlier channel increases the overall gradient scale, while less emphasizing the corresponding outlier channel.
  • Figure 5: Regularized form of GraLoRA as multiplication of sparse two matrices, $A_{\text{GraLoRA}}$ and $B_{\text{GraLoRA}}$.
  • ...and 5 more figures