ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Zequan Liu; Jiawen Lyn; Wei Zhu; Xing Tian; Yvette Graham

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, Yvette Graham

TL;DR

This work tackles the challenge of fixed LoRA ranks in parameter-efficient fine-tuning by introducing Allocating LoRA (ALoRA), which dynamically reallocates low-rank budgets across Transformer modules during fine-tuning. ALoRA hinges on AB-LoRA, an ablation-based rank importance estimator that avoids reliance on heuristic architecture weights, guiding gradual pruning of abundant or harmful ranks and redistribution of budgets to important modules. The method trains a super-network, evaluates rank contributions via validation-based ablations, and iterates pruning and reallocation without requiring bi-level optimization, achieving superior performance with comparable tunable parameter counts. Experiments across diverse NLP tasks and backbones demonstrate strong gains over LoRA variants and adapters, highlighting practical impact for efficient, flexible fine-tuning of large language models.

Abstract

Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in the era of large language models. Low-rank adaptation (LoRA) has demonstrated commendable performance as a popular and representative method. However, it is implemented with a fixed intrinsic rank that might not be the ideal setting for the downstream tasks. Recognizing the need for more flexible downstream task adaptation, we extend the methodology of LoRA to an innovative approach we call allocating low-rank adaptation (ALoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank. Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks. We have conducted experiments on various tasks, and the experimental results demonstrate that our ALoRA method can outperform the recent baselines with comparable tunable parameters.

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

TL;DR

Abstract

Paper Structure (31 sections, 6 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 31 sections, 6 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Related works
Parameter-efficient fine-tuning (PEFT) methods
The LoRA method and its variants
Methods
Preliminaries
Formulation
Our novel AB-LoRA method
The complete process of ALoRA
Experiments
Baselines
Datasets and evaluation metrics
Experiment Settings
Main results
Ablation studies and analysis
...and 16 more sections

Figures (4)

Figure 1: Schematic illustration of our ALoRA. Left (a): ALoRA follows LoRA to update the weight matrix $W$ by fine-tuning the low-rank matrices $A$ and $B$ with intermediate rank $k$. Matrix $G$ is a diagonal matrix where each diagonal element is the gate unit $\alpha_{i}$ for each LoRA rank $i < k$. Each $\alpha_{i}$ is set to 1 at initialization. Right upper (b): Some abundant LoRA ranks are pruned by setting the corresponding gate $\alpha_{i}$ to zeros. Right lower (c): For weight matrix $W$ whose LoRA ranks are not pruned, we will assign additional LoRA ranks to enhance reparameterization.
Figure 2: Performances under different LoRA rank budgets. The $x$-axis represents the number of tunable parameters, and the $y$-axis represents the performance score.
Figure 3: The final rank allocations of ALoRA after fine-tuning the LlaMA-2 7B model on the E2E task.
Figure 4: The final rank allocations of SoRA after fine-tuning the LlaMA-2 7B model on the E2E task.

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

TL;DR

Abstract

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)