Table of Contents
Fetching ...

WeightLoRA: Keep Only Necessary Adapters

Andrey Veprikov, Vladimir Solodkin, Alexander Zyl, Andrey Savchenko, Aleksandr Beznosikov

TL;DR

WeightLoRA introduces a sparsity-driven adapter selection mechanism for PEFT, learning per-adapter weights to retain only the most impactful LoRA heads during training. The framework, including WeightLoRA and WeightLoRA+, reduces trainable parameters by pruning adapters and, in WeightLoRA+, expands the rank of selected adapters to boost capacity. Across NLP tasks and models (e.g., DeBERTaV3-base, BART, Llama3-7B) on GLUE, SQuAD, XSum, and CNN/DailyMail, WeightLoRA matches or exceeds LoRA performance with far fewer trainable parameters, while WeightLoRA+ often outperforms LoRA and WeightLoRA. The results demonstrate practical memory-efficient fine-tuning with competitive or superior accuracy, offering a scalable solution for resource-constrained environments. The work provides public code to facilitate replication and adoption in real-world PEFT workflows.

Abstract

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($\texttt{LoRA}$), which adds trainable adapters to selected layers. Although $\texttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, $\texttt{WeightLoRA}$, which overcomes this issue by adaptive selection of the most critical $\texttt{LoRA}$ heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of $\texttt{WeightLoRA}$ and the superior performance of $\texttt{WeightLoRA+}$ in almost all cases.

WeightLoRA: Keep Only Necessary Adapters

TL;DR

WeightLoRA introduces a sparsity-driven adapter selection mechanism for PEFT, learning per-adapter weights to retain only the most impactful LoRA heads during training. The framework, including WeightLoRA and WeightLoRA+, reduces trainable parameters by pruning adapters and, in WeightLoRA+, expands the rank of selected adapters to boost capacity. Across NLP tasks and models (e.g., DeBERTaV3-base, BART, Llama3-7B) on GLUE, SQuAD, XSum, and CNN/DailyMail, WeightLoRA matches or exceeds LoRA performance with far fewer trainable parameters, while WeightLoRA+ often outperforms LoRA and WeightLoRA. The results demonstrate practical memory-efficient fine-tuning with competitive or superior accuracy, offering a scalable solution for resource-constrained environments. The work provides public code to facilitate replication and adoption in real-world PEFT workflows.

Abstract

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation (), which adds trainable adapters to selected layers. Although may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, , which overcomes this issue by adaptive selection of the most critical heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of and the superior performance of in almost all cases.

Paper Structure

This paper contains 18 sections, 10 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Comparison between LoRA (left) and the proposed WeightLoRA framework (right). The core idea of WeightLoRA is to add weights to the LoRA adapters, choose the most important ones, and train only this small subset.
  • Figure 2: Comparison of the absolute values of the scalar products $\langle\nabla_{W^i} f(\mathcal{W}), A^iB^i \rangle$ from \ref{['eq:dotprods']} for all layers $i \in \{1, 2, ..., 36\}$. The adapters selected through our WeightLoRA framework are starred.
  • Figure 3: Dependence of the amount of required GPU memory on the number of connected LoRA adapters. The red dashed line indicates the memory capacity of the NVIDIA V100 16GB.