Table of Contents
Fetching ...

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

Haobo Zhang, Jiayu Zhou

TL;DR

This work tackles the problem of merging LoRA-tuned models for multi-task deployment, where prior merging methods often degrade performance due to interference between parameter updates and data distributions. It proposes Orthogonal Subspaces for Robust model Merging (OSRM), which constrains the LoRA subspace before fine-tuning by minimizing $||A_2 H_1^T||_F$ under $A_2 A_2^T = I$, yielding an analytical solution via the eigenvectors of the sample covariance $S = \frac{1}{k-1} H^T H$. OSRM’s subspace constraint is complementary to existing merging algorithms and is extended with practical relaxations (e.g., Procrustes-based updates, multi-task generalization). Experiments across eight GLUE tasks and multiple models (RoBERTa-large, T5-large, Llama3 variants) show consistent improvements in merging performance while preserving single-task accuracy, and robustness to hyperparameters like $\lambda$, $k$, and the number of tasks. The results highlight the importance of data-parameter interplay in merging and offer a plug-and-play approach for robust LoRA model merging with broad practical impact.

Abstract

Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

TL;DR

This work tackles the problem of merging LoRA-tuned models for multi-task deployment, where prior merging methods often degrade performance due to interference between parameter updates and data distributions. It proposes Orthogonal Subspaces for Robust model Merging (OSRM), which constrains the LoRA subspace before fine-tuning by minimizing under , yielding an analytical solution via the eigenvectors of the sample covariance . OSRM’s subspace constraint is complementary to existing merging algorithms and is extended with practical relaxations (e.g., Procrustes-based updates, multi-task generalization). Experiments across eight GLUE tasks and multiple models (RoBERTa-large, T5-large, Llama3 variants) show consistent improvements in merging performance while preserving single-task accuracy, and robustness to hyperparameters like , , and the number of tasks. The results highlight the importance of data-parameter interplay in merging and offer a plug-and-play approach for robust LoRA model merging with broad practical impact.

Abstract

Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.

Paper Structure

This paper contains 46 sections, 16 equations, 5 figures, 16 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of OSRM, which seeks a data-driven subspace to initiate LoRA fine-tuning and thereby greatly improves model performance when merging multiple LoRA models from different tasks. $W_0$ is the pre-trained weight. $\{B_i,A_i\}$ are LoRA fine-tuned on the $i$-th task. Purple: $(W_0+B_1A_1)*h_1$ is the required output. Light blue: Decompose the sample covariance matrix to initialize $A_2$. Dark blue: Reduce the output shift induced by $B_2A_2$.
  • Figure 2: The rank of $H_1$ (y-axis) vs. the number of samples $k$ (x-axis) with RoBERTa-large roberta. The grey line represents $y=x$. For each dot, $k$ samples are randomly selected to concatenate their latent features as $H_1$.
  • Figure 3: The change of $\Tilde{A}$ ($\%$) after fine-tuning compared to the initialization. A normalized distance is used as the metric. See \ref{['subsec:extension']} for details.
  • Figure 4: Effect of scaling coefficients on the performance of TA and TIES merging. Results are averaged across eight datasets. The solid line is the merging performance for each scaling coefficient. The dashed line is the average performance for each method.
  • Figure 5: Performance of merging different numbers of tasks with RegMean and EMR. Results are averaged across eight datasets. The solid line is the merging performance for each number of tasks. The dashed line is the average performance for each method.

Theorems & Definitions (1)

  • proof