Orthogonal Model Merging

Sihan Yang; Kexuan Shi; Weiyang Liu

Orthogonal Model Merging

Sihan Yang, Kexuan Shi, Weiyang Liu

TL;DR

OrthoMerge tackles the problem of merging task-specific finetuned LLMs without eroding the geometric structure of pretrained weights. It achieves this by performing merging on the orthogonal group, mapping to the Lie algebra $\mathfrak{so}(d)$ for magnitude-correct averaging, and using the Cayley transform to return to the orthogonal manifold; non-OFT models are handled via Orthogonal-Residual Decoupling that separates an explicit orthogonal component from Euclidean residuals. The approach yields consistent improvements over Euclidean-space baselines across language and multimodal domains, reduces catastrophic forgetting, and preserves both general and task-specific capabilities. These results demonstrate a scalable, geometry-aware path to composing diverse intelligent behaviors without retraining.

Abstract

Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model. However, prevailing model merging methods rely on linear arithmetic in Euclidean space, which often destroys the intrinsic geometric properties of pretrained weights, such as hyperspherical energy. To address this, we propose Orthogonal Model Merging (OrthoMerge), a method that performs merging operations on the Riemannian manifold formed by the orthogonal group to preserve the geometric structure of the model's weights. By mapping task-specific orthogonal matrices learned by Orthogonal Finetuning (OFT) to the Lie algebra, OrthoMerge enables a principled yet efficient integration that takes into account both the direction and intensity of adaptations. In addition to directly leveraging orthogonal matrices obtained by OFT, we further extend this approach to general models finetuned with non-OFT methods (i.e., low-rank finetuning, full finetuning) via an Orthogonal-Residual Decoupling strategy. This technique extracts the orthogonal components of expert models by solving the orthogonal Procrustes problem, which are then merged on the manifold of the orthogonal group, while the remaining linear residuals are processed through standard additive merging. Extensive empirical results demonstrate the effectiveness of OrthoMerge in mitigating catastrophic forgetting and maintaining model performance across diverse tasks.

Orthogonal Model Merging

TL;DR

for magnitude-correct averaging, and using the Cayley transform to return to the orthogonal manifold; non-OFT models are handled via Orthogonal-Residual Decoupling that separates an explicit orthogonal component from Euclidean residuals. The approach yields consistent improvements over Euclidean-space baselines across language and multimodal domains, reduces catastrophic forgetting, and preserves both general and task-specific capabilities. These results demonstrate a scalable, geometry-aware path to composing diverse intelligent behaviors without retraining.

Abstract

Paper Structure (27 sections, 14 equations, 9 figures, 10 tables, 2 algorithms)

This paper contains 27 sections, 14 equations, 9 figures, 10 tables, 2 algorithms.

Introduction
Related Work
OrthoMerge: Orthogonal Model Merging
Preliminaries
Orthogonal Merging for OFT-trained Models
Orthogonal-Residual Decoupling for Merging Non-OFT Models
Intriguing Insights and Discussions
Experiments and Results
Merging OFT-Finetuned Models
Experimental Setup
Results and Discussion
Merging Non-OFT Models
Experimental Setup
Results and Discussion
Vision-Language Model Extension
...and 12 more sections

Figures (9)

Figure 1: An intuitive comparison among (a) current model merging, the proposed (b) orthogonal merging and (c) orthogonal-residual decoupling merging.
Figure 2: Illustration of OrthoMerge. (a) To merge orthogonal transformations, we first map them to the Lie algebra $\mathfrak{so}(d)$, perform the merging there with magnitude correction to preserve the strength of the transformations, and finally map the result back to the orthogonal group. (b) For general models, we decouple weights into orthogonal and residual components, merging them separately on the Riemannian manifold formed by the orthogonal group and in Euclidean space, respectively.
Figure 3: Loss landscape of the base model, TA and OrthoMerge.
Figure 4: Comparison of norm distributions between different decoupling strategies applied to the models from Section \ref{['sec:non-oft-mergebench']}. (a) Norm statistics using the Global Decoupling strategy. (b) Norm statistics using the Conflict-Aware Decoupling strategy.
Figure 5: Performance gain vs. number of tasks of different merging methods over base model.
...and 4 more figures

Orthogonal Model Merging

TL;DR

Abstract

Orthogonal Model Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (9)