Table of Contents
Fetching ...

Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs

Hao Ban, Kaiyi Ji

TL;DR

This work reevaluates parameter sharing in multi-LoRA fine-tuning for LLMs, showing that similarity among $A$ matrices largely stems from identical initialization rather than shared knowledge, while the $B$ matrices carry domain knowledge. It proposes ALoRA, an asymmetric multi-LoRA architecture with multiple $A$ matrices and a single shared $B$, and Fed-ALoRA, which shares $B$ across clients and employs a decomposition strategy to support heterogeneous client ranks. Across intra-domain multi-task, cross-domain NLP, and federated NLP benchmarks, these methods yield more balanced task performance with comparable or superior accuracy and substantially reduced communication compared with full-LoRA aggregation and prior sharing approaches. The findings offer a practical path to efficient, knowledge-rich adaptation of LLMs in multi-task and federated settings, with broad implications for parameter-efficient fine-tuning and cross-task transfer.

Abstract

Large language models are often adapted using parameter-efficient techniques such as Low-Rank Adaptation (LoRA), formulated as $y = W_0x + BAx$, where $W_0$ is the pre-trained parameters and $x$ is the input to the adapted layer. While multi-adapter extensions often employ multiple LoRAs, prior studies suggest that the inner $A$ matrices are highly similar during training and thus suitable for sharing. We revisit this phenomenon and find that this similarity is largely attributable to the identical initialization rather than shared knowledge, with $B$ playing a more critical role in knowledge encoding and transfer. Motivated by these insights, we propose \textbf{ALoRA}, an asymmetric multi-LoRA design with multiple $A$ matrices and a single shared $B$ in multi-task fine-tuning, and \textbf{Fed-ALoRA}, which shares $B$ across clients in federated fine-tuning under both homogeneous and heterogeneous settings, through a novel matrix decomposition strategy to accommodate heterogeneous ranks across clients. Experiments on commonsense reasoning, math reasoning, multi-task NLP dataset, and federated NLP dataset demonstrate that our methods achieve more balanced performance across tasks with comparable or superior average accuracy relative to existing multi-LoRA approaches. Codes are available at https://github.com/OptMN-Lab/ALoRA.

Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs

TL;DR

This work reevaluates parameter sharing in multi-LoRA fine-tuning for LLMs, showing that similarity among matrices largely stems from identical initialization rather than shared knowledge, while the matrices carry domain knowledge. It proposes ALoRA, an asymmetric multi-LoRA architecture with multiple matrices and a single shared , and Fed-ALoRA, which shares across clients and employs a decomposition strategy to support heterogeneous client ranks. Across intra-domain multi-task, cross-domain NLP, and federated NLP benchmarks, these methods yield more balanced task performance with comparable or superior accuracy and substantially reduced communication compared with full-LoRA aggregation and prior sharing approaches. The findings offer a practical path to efficient, knowledge-rich adaptation of LLMs in multi-task and federated settings, with broad implications for parameter-efficient fine-tuning and cross-task transfer.

Abstract

Large language models are often adapted using parameter-efficient techniques such as Low-Rank Adaptation (LoRA), formulated as , where is the pre-trained parameters and is the input to the adapted layer. While multi-adapter extensions often employ multiple LoRAs, prior studies suggest that the inner matrices are highly similar during training and thus suitable for sharing. We revisit this phenomenon and find that this similarity is largely attributable to the identical initialization rather than shared knowledge, with playing a more critical role in knowledge encoding and transfer. Motivated by these insights, we propose \textbf{ALoRA}, an asymmetric multi-LoRA design with multiple matrices and a single shared in multi-task fine-tuning, and \textbf{Fed-ALoRA}, which shares across clients in federated fine-tuning under both homogeneous and heterogeneous settings, through a novel matrix decomposition strategy to accommodate heterogeneous ranks across clients. Experiments on commonsense reasoning, math reasoning, multi-task NLP dataset, and federated NLP dataset demonstrate that our methods achieve more balanced performance across tasks with comparable or superior average accuracy relative to existing multi-LoRA approaches. Codes are available at https://github.com/OptMN-Lab/ALoRA.

Paper Structure

This paper contains 28 sections, 6 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Layer-wise similarity analysis between different LoRA modules. Left: two different tasks with the same random seed. Middle: the same task with different random seeds. Right: two different tasks with different random seeds. $A_i$ matrices are similar only under the same initialization, whereas $B_i$ exhibits relatively stable similarity across different tasks and seeds.
  • Figure 2: Comparison of LoRA modules before and after the fine-tuning. Left: similarity; Middle: magnitude change; Right: direction change. The module $A$ remains largely unchanged from initialization, whereas the module $B$ exhibits pronounced variation in both magnitude and direction. Overall, LoRA shows limited magnitude change, with nearly all directional change captured by $B$.
  • Figure 3: Comparing sharing $A$ versus $B$ in multi-task fine-tuning. Left: gradient magnitudes of $A$ and $B$. Right: number of gradient conflicts per layer. Sharing $A$ causes smaller gradient magnitudes and more frequent conflicts than sharing $B$.
  • Figure 4: ALoRA adopts multiple $A$ and a single $B$ to explore diverse feature subspaces.
  • Figure 5: Fed-ALoRA shares only $B$ matrices for server aggregation. Left: Homogeneous setting (same rank), where the shared $B$ is directly transmitted. Right: Heterogeneous setting (different ranks), where the shared $B$ is decomposed into two matrices for heterogeneity. Compared to the standard full LoRA aggregation, the communication cost per client is reduced to $\mathcal{O}(d_\text{out}r)$ in the homogeneous setting and $\mathcal{O}(d_\text{out}r_i)$ in the heterogeneous setting if $d_m$ is chosen appropriately.
  • ...and 4 more figures