WeightLoRA: Keep Only Necessary Adapters

Andrey Veprikov; Vladimir Solodkin; Alexander Zyl; Andrey Savchenko; Aleksandr Beznosikov

WeightLoRA: Keep Only Necessary Adapters

Andrey Veprikov, Vladimir Solodkin, Alexander Zyl, Andrey Savchenko, Aleksandr Beznosikov

TL;DR

WeightLoRA introduces a sparsity-driven adapter selection mechanism for PEFT, learning per-adapter weights to retain only the most impactful LoRA heads during training. The framework, including WeightLoRA and WeightLoRA+, reduces trainable parameters by pruning adapters and, in WeightLoRA+, expands the rank of selected adapters to boost capacity. Across NLP tasks and models (e.g., DeBERTaV3-base, BART, Llama3-7B) on GLUE, SQuAD, XSum, and CNN/DailyMail, WeightLoRA matches or exceeds LoRA performance with far fewer trainable parameters, while WeightLoRA+ often outperforms LoRA and WeightLoRA. The results demonstrate practical memory-efficient fine-tuning with competitive or superior accuracy, offering a scalable solution for resource-constrained environments. The work provides public code to facilitate replication and adoption in real-world PEFT workflows.

Abstract

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($\texttt{LoRA}$), which adds trainable adapters to selected layers. Although $\texttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, $\texttt{WeightLoRA}$, which overcomes this issue by adaptive selection of the most critical $\texttt{LoRA}$ heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of $\texttt{WeightLoRA}$ and the superior performance of $\texttt{WeightLoRA+}$ in almost all cases.

WeightLoRA: Keep Only Necessary Adapters

TL;DR

Abstract

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation (

), which adds trainable adapters to selected layers. Although

may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method,

, which overcomes this issue by adaptive selection of the most critical

heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of

and the superior performance of

in almost all cases.

WeightLoRA: Keep Only Necessary Adapters

TL;DR

Abstract

WeightLoRA: Keep Only Necessary Adapters

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)