Table of Contents
Fetching ...

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

Yurun Song, Junchen Zhao, Ian G. Harris, Sangeetha Abdu Jyothi

TL;DR

ShareLoRA introduces a parameter-efficient fine-tuning method that shares low-rank adapters across layers to dramatically reduce trainable parameters while preserving and often improving performance across NLU, NLG, and cross-domain tasks. It defines three configurations (ShareA, ShareB, ShareAB) and a self-attention variant, with ShareA generally providing the best balance of efficiency and accuracy. Across RoBERTa, GPT-2, and LLaMA series, ShareLoRA achieves 44%–96% parameter reductions, measurable memory savings, and robust continual adaptation, outpacing standard LoRA in zero-shot and few-shot settings and enhancing cross-domain generalization. This approach enables high-quality fine-tuning on large models in resource-constrained environments, with practical benefits for model deployment and transfer learning.

Abstract

In this paper, we introduce \textbf{Share}d \textbf{Lo}w \textbf{R}ank \textbf{A}daptation (ShareLoRA), a Large Language Model (LLM) fine-tuning technique that balances parameter efficiency, adaptability, and robustness without compromising performance. By strategically sharing the low-rank weight matrices across different layers, ShareLoRA achieves 44\% to 96\% reduction in trainable parameters compared to standard LoRA, alongside a substantial decrease in memory overhead. This efficiency gain scales with model size, making ShareLoRA particularly advantageous for resource-constrained environments. Importantly, ShareLoRA not only maintains model performance but also exhibits robustness in both classification and generation tasks across diverse models, including RoBERTa, GPT-2, and LLaMA series (1, 2, and 3). It consistently outperforms LoRA in zero-shot, few-shot, and continual fine-tuning scenarios, achieving up to 1.2\% average accuracy improvement, and enhanced generalization across domains. In continual learning settings, ShareLoRA achieves 1.2\% higher accuracy on GSM8K, 0.6\% on HumanEval, and 0.5\% on both MMLU and MMLU-Pro. Our results demonstrate that ShareLoRA supports high-quality fine-tuning while offering strong generalization and continual adaptation across various model scales and diverse tasks.

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

TL;DR

ShareLoRA introduces a parameter-efficient fine-tuning method that shares low-rank adapters across layers to dramatically reduce trainable parameters while preserving and often improving performance across NLU, NLG, and cross-domain tasks. It defines three configurations (ShareA, ShareB, ShareAB) and a self-attention variant, with ShareA generally providing the best balance of efficiency and accuracy. Across RoBERTa, GPT-2, and LLaMA series, ShareLoRA achieves 44%–96% parameter reductions, measurable memory savings, and robust continual adaptation, outpacing standard LoRA in zero-shot and few-shot settings and enhancing cross-domain generalization. This approach enables high-quality fine-tuning on large models in resource-constrained environments, with practical benefits for model deployment and transfer learning.

Abstract

In this paper, we introduce \textbf{Share}d \textbf{Lo}w \textbf{R}ank \textbf{A}daptation (ShareLoRA), a Large Language Model (LLM) fine-tuning technique that balances parameter efficiency, adaptability, and robustness without compromising performance. By strategically sharing the low-rank weight matrices across different layers, ShareLoRA achieves 44\% to 96\% reduction in trainable parameters compared to standard LoRA, alongside a substantial decrease in memory overhead. This efficiency gain scales with model size, making ShareLoRA particularly advantageous for resource-constrained environments. Importantly, ShareLoRA not only maintains model performance but also exhibits robustness in both classification and generation tasks across diverse models, including RoBERTa, GPT-2, and LLaMA series (1, 2, and 3). It consistently outperforms LoRA in zero-shot, few-shot, and continual fine-tuning scenarios, achieving up to 1.2\% average accuracy improvement, and enhanced generalization across domains. In continual learning settings, ShareLoRA achieves 1.2\% higher accuracy on GSM8K, 0.6\% on HumanEval, and 0.5\% on both MMLU and MMLU-Pro. Our results demonstrate that ShareLoRA supports high-quality fine-tuning while offering strong generalization and continual adaptation across various model scales and diverse tasks.
Paper Structure (26 sections, 4 equations, 7 figures, 11 tables)

This paper contains 26 sections, 4 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Overview of ShareLoRA: The implementation of ShareA, ShareB, and ShareAB across all layers (left), including ShareA applied across self-attention layers (right).
  • Figure 2: Memory Consumption of LLaMA3 70B with QLoRA and QLoRA-shareA (QShareA).
  • Figure 3: Distribution of Singular Values for LLaMA 13B: SVD Decomposition Analysis of LoRA (left) and ShareA (right) across All Layers.
  • Figure 4: LLaMA 7B & 13B on LoRA / ShareA (upper) and on QLoRA / QShareA (down) MMLU Dev Performance with the standard deviation error distribution of different seeds
  • Figure 5: Average Performance Plot for Various LLaMA Models on the Alpaca-MMLU Dev Dataset
  • ...and 2 more figures