Table of Contents
Fetching ...

Less is More: Resource-Efficient Low-Rank Adaptation

Chunlin Tian, Xuyang Wei, Huanrong Liu, Zhijiang Guo, Li Li

TL;DR

EffiLoRA tackles parameter redundancy and task interference in parameter-efficient fine-tuning by sharing a global low-rank matrix across Transformer layers and employing input-driven, layer-specific B heads with a lightweight Router. A dynamic Reducer further tailors training by selectively freezing B heads based on importance scores, enabling substantial parameter savings with minimal performance loss. Across language, vision-language, and diffusion tasks, EffiLoRA consistently outperforms LoRA and competitive baselines, achieving strong accuracy with dramatically fewer tunable parameters and reduced training cost. The approach offers a practical path to robust, scalable PEFT in heterogeneous, resource-constrained settings and suggests avenues for future work in pre-training and routing mechanisms.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While recent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices update to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demonstrating improved efficiency and robustness.

Less is More: Resource-Efficient Low-Rank Adaptation

TL;DR

EffiLoRA tackles parameter redundancy and task interference in parameter-efficient fine-tuning by sharing a global low-rank matrix across Transformer layers and employing input-driven, layer-specific B heads with a lightweight Router. A dynamic Reducer further tailors training by selectively freezing B heads based on importance scores, enabling substantial parameter savings with minimal performance loss. Across language, vision-language, and diffusion tasks, EffiLoRA consistently outperforms LoRA and competitive baselines, achieving strong accuracy with dramatically fewer tunable parameters and reduced training cost. The approach offers a practical path to robust, scalable PEFT in heterogeneous, resource-constrained settings and suggests avenues for future work in pre-training and routing mechanisms.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While recent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices update to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demonstrating improved efficiency and robustness.

Paper Structure

This paper contains 35 sections, 4 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Matrix-wise optimization of LoRA.
  • Figure 2: Impact of dropping different numbers of B modules.
  • Figure 3: Performance comparison on heterogeneous data on LLaVA-7B liu2023llava, evaluated on the VizWiz dataset bigham2010vizwiz.
  • Figure 4: Architecture and workflow of EffiLoRA. Given a base model and a target dataset, the Configurator generates a shared-asymmetric-head LoRA structure, where a global low-rank matrix $A$ is reused across layers while each $B_{i,j}$ remains layer- and head-specific. A Reducer then prunes redundant $B$ heads under resource and performance constraints, yielding an optimized low-parameter LoRA configuration that balances efficiency and effectiveness.
  • Figure 5: Performance of different drop ratios.
  • ...and 2 more figures