Table of Contents
Fetching ...

Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging

Shenghe Zheng, Hongzhi Wang, Chenyu Huang, Xiaohui Wang, Tao Chen, Jiayuan Fan, Shuyue Hu, Peng Ye

TL;DR

This paper tackles LoRA-specific model merging by identifying magnitude distribution variance as the root cause of degradation when applying full-finetuning merging techniques to LoRA. It introduces DO-Merging, a decoupled approach that separates magnitude and direction and pairs it with data-free, layer-wise orthogonalization to minimize task interference. The authors provide theoretical guarantees for both the decoupled and orthogonal components and validate the method across vision, language, and multi-modal tasks, demonstrating consistent, near-free gains with broad compatibility. The work suggests a practical, data-free pathway to robust multi-task merging, with potential for integration alongside existing merging strategies and extensions to full-finetune scenarios.

Abstract

With more open-source models available for diverse tasks, model merging has gained attention by combining models into one, reducing training, storage, and inference costs. Current research mainly focuses on model merging for full fine-tuning, overlooking the popular LoRA. However, our empirical analysis reveals that: a) existing merging methods designed for full fine-tuning perform poorly on LoRA; b) LoRA modules show much larger parameter magnitude variance than full fine-tuned weights; c) greater parameter magnitude variance correlates with worse merging performance. Considering that large magnitude variances cause deviations in the distribution of the merged parameters, resulting in information loss and performance degradation, we propose a Decoupled and Orthogonal merging approach(DO-Merging). By separating parameters into magnitude and direction components and merging them independently, we reduce the impact of magnitude differences on the directional alignment of the merged models, thereby preserving task information. Furthermore, we introduce a data-free, layer-wise gradient descent method with orthogonal constraints to mitigate interference during the merging of direction components. We provide theoretical guarantees for both the decoupling and orthogonal components. And we validate through extensive experiments across vision, language, and multi-modal domains that our proposed DO-Merging can achieve significantly higher performance than existing merging methods at a minimal cost. Notably, each component can be flexibly integrated with existing methods, offering near free-lunch improvements across tasks.

Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging

TL;DR

This paper tackles LoRA-specific model merging by identifying magnitude distribution variance as the root cause of degradation when applying full-finetuning merging techniques to LoRA. It introduces DO-Merging, a decoupled approach that separates magnitude and direction and pairs it with data-free, layer-wise orthogonalization to minimize task interference. The authors provide theoretical guarantees for both the decoupled and orthogonal components and validate the method across vision, language, and multi-modal tasks, demonstrating consistent, near-free gains with broad compatibility. The work suggests a practical, data-free pathway to robust multi-task merging, with potential for integration alongside existing merging strategies and extensions to full-finetune scenarios.

Abstract

With more open-source models available for diverse tasks, model merging has gained attention by combining models into one, reducing training, storage, and inference costs. Current research mainly focuses on model merging for full fine-tuning, overlooking the popular LoRA. However, our empirical analysis reveals that: a) existing merging methods designed for full fine-tuning perform poorly on LoRA; b) LoRA modules show much larger parameter magnitude variance than full fine-tuned weights; c) greater parameter magnitude variance correlates with worse merging performance. Considering that large magnitude variances cause deviations in the distribution of the merged parameters, resulting in information loss and performance degradation, we propose a Decoupled and Orthogonal merging approach(DO-Merging). By separating parameters into magnitude and direction components and merging them independently, we reduce the impact of magnitude differences on the directional alignment of the merged models, thereby preserving task information. Furthermore, we introduce a data-free, layer-wise gradient descent method with orthogonal constraints to mitigate interference during the merging of direction components. We provide theoretical guarantees for both the decoupling and orthogonal components. And we validate through extensive experiments across vision, language, and multi-modal domains that our proposed DO-Merging can achieve significantly higher performance than existing merging methods at a minimal cost. Notably, each component can be flexibly integrated with existing methods, offering near free-lunch improvements across tasks.

Paper Structure

This paper contains 33 sections, 3 theorems, 24 equations, 6 figures, 14 tables, 1 algorithm.

Key Result

Theorem 3.1

$\mathbb{E}(L)$ achieves its minimum when $||\alpha_1||_2=||\alpha_2||_2$, and is greater than this minimum in both cases $|| \alpha_1||_2>|| \alpha_2||_2$ and $|| \alpha_1||_2<|| \alpha_2||_2$.

Figures (6)

  • Figure 1: Key Observations on LoRA Merging. (a) Existing methods work well for full fine-tuning but fail on LoRA. (b) LoRA shows larger parameter discrepancies across tasks than full fine-tuning. The Magnitude Distribution Variance is calculated as discussed in Appendix \ref{['appendix:Computational']}. (c) A greater parameter discrepancy between models correlates with worse merging performance.
  • Figure 2: DO-Merging Framework. Left: Large magnitude differences in LoRA across tasks degrade merging performance. Middle: DO-Merging process—orthogonal perturbation, decoupling magnitude and direction, and separate merging. Right: Single model deployment for multiple tasks.
  • Figure 3: Key observations on orthogonalization. (a) Average Norm Performance change of task models during orthogonal gradient descent. Performance remains stable. (b) As orthogonality increases, the average merged norm accuracy also improves.
  • Figure 4: Discussion on Key Properties of DO-Merging. (a). Both components of DO-Merging can be freely combined with other merging methods, and bring near free-lunch improvement. (b). Orthogonality in LoRA is crucial. Performance drops significantly when the direction vectors are made non-orthogonal. (c). Our method performs well across different LoRA ranks.
  • Figure 5: Applying DO-Merging to the merging of fully fine-tuned models is also effective. Experiments are conducted on ViT-B/32.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3