Table of Contents
Fetching ...

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Huandong Chang, Zicheng Ma, Mingyuan Ma, Zhenting Qi, Andrew Sabot, Hong Jiang, H. T. Kung

TL;DR

ElaLoRA tackles the inefficiency of fine-tuning large models by introducing a fully adaptive low-rank adaptation framework that prunes and expands LoRA ranks during training. It builds on an SVD-based parameterization $W = W^{(0)} + P \Lambda Q$, guided by gradient-based importance scores $s(w) = \left| w \frac{\partial L}{\partial w} \right|$ and stabilized by EMA, enabling selective allocation of capacity to the most impactful layers. The method comprises a three-phase learning schedule (warm-up, dynamic adjustment, stabilization) plus a dynamic rank scheduler, and it is the first to enable both pruning and expansion of ranks simultaneously. Across GLUE, XSum, and VTAB benchmarks, ElaLoRA outperforms fixed-rank LoRA and AdaLoRA under various budgets, with analyses showing that high-rank allocations align with the most task-relevant components, offering a scalable, resource-efficient path for PEFT in constrained environments.

Abstract

Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

TL;DR

ElaLoRA tackles the inefficiency of fine-tuning large models by introducing a fully adaptive low-rank adaptation framework that prunes and expands LoRA ranks during training. It builds on an SVD-based parameterization , guided by gradient-based importance scores and stabilized by EMA, enabling selective allocation of capacity to the most impactful layers. The method comprises a three-phase learning schedule (warm-up, dynamic adjustment, stabilization) plus a dynamic rank scheduler, and it is the first to enable both pruning and expansion of ranks simultaneously. Across GLUE, XSum, and VTAB benchmarks, ElaLoRA outperforms fixed-rank LoRA and AdaLoRA under various budgets, with analyses showing that high-rank allocations align with the most task-relevant components, offering a scalable, resource-efficient path for PEFT in constrained environments.

Abstract

Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

Paper Structure

This paper contains 34 sections, 5 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of ElaLoRA, LoRA, and AdaLoRA. LoRA has fixed ranks, AdaLoRA prunes ranks, while ElaLoRA both prunes and expands ranks during training. $R_i$ denotes the importance score of the $i^{th}$ rank.
  • Figure 2: RTE Final Rank Heatmap $(r=4)$
  • Figure 3: RTE Final Rank Heatmap $(r=10)$
  • Figure 4: RTE Performance Comparisons
  • Figure 5: Comparison of importance score distributions for the MRPC task at $r=4$ (left) and $r=10$ (right) settings with different methods. Beginning of ElaLoRA: Right after the Warm-up Phase, where all ranks are still fixed. End of ElaLoRA Rank Searching: Right after the Dynamic Rank Adjustment Phase, where ElaLoRA has finished dynamically allocating ranks. No Rank Change (SVD Setup): Trained for the same number of iterations as ElaLoRA but without any rank updates (i.e., ranks remain static). End of AdaLoRA Pruning: Trained for the same number of iterations as ElaLoRA, but using AdaLoRA's pruning-based approach.
  • ...and 4 more figures