ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models
Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, Yvette Graham
TL;DR
This work tackles the challenge of fixed LoRA ranks in parameter-efficient fine-tuning by introducing Allocating LoRA (ALoRA), which dynamically reallocates low-rank budgets across Transformer modules during fine-tuning. ALoRA hinges on AB-LoRA, an ablation-based rank importance estimator that avoids reliance on heuristic architecture weights, guiding gradual pruning of abundant or harmful ranks and redistribution of budgets to important modules. The method trains a super-network, evaluates rank contributions via validation-based ablations, and iterates pruning and reallocation without requiring bi-level optimization, achieving superior performance with comparable tunable parameter counts. Experiments across diverse NLP tasks and backbones demonstrate strong gains over LoRA variants and adapters, highlighting practical impact for efficient, flexible fine-tuning of large language models.
Abstract
Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in the era of large language models. Low-rank adaptation (LoRA) has demonstrated commendable performance as a popular and representative method. However, it is implemented with a fixed intrinsic rank that might not be the ideal setting for the downstream tasks. Recognizing the need for more flexible downstream task adaptation, we extend the methodology of LoRA to an innovative approach we call allocating low-rank adaptation (ALoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank. Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks. We have conducted experiments on various tasks, and the experimental results demonstrate that our ALoRA method can outperform the recent baselines with comparable tunable parameters.
