Less is More: Resource-Efficient Low-Rank Adaptation
Chunlin Tian, Xuyang Wei, Huanrong Liu, Zhijiang Guo, Li Li
TL;DR
EffiLoRA tackles parameter redundancy and task interference in parameter-efficient fine-tuning by sharing a global low-rank matrix across Transformer layers and employing input-driven, layer-specific B heads with a lightweight Router. A dynamic Reducer further tailors training by selectively freezing B heads based on importance scores, enabling substantial parameter savings with minimal performance loss. Across language, vision-language, and diffusion tasks, EffiLoRA consistently outperforms LoRA and competitive baselines, achieving strong accuracy with dramatically fewer tunable parameters and reduced training cost. The approach offers a practical path to robust, scalable PEFT in heterogeneous, resource-constrained settings and suggests avenues for future work in pre-training and routing mechanisms.
Abstract
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While recent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices update to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demonstrating improved efficiency and robustness.
