Table of Contents
Fetching ...

Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Zhixiang Wang, Zhenyu Mao, Yixuan Qiao, Yunfang Wu, Biye Li

TL;DR

This work tackles interference during merging of LLMs by introducing OBIM, which combines saliency-based pruning (Optimal Brain Merging) with a mutually exclusive, iterative merging procedure (Iterative Merging). By approximating parameter saliency via a layer-wise MSE proxy and a diagonal Hessian term, OBM preserves high-saliency weights while discarding the rest, and IM uses non-overlapping masks to prevent parameter interference across models. Across SFT and post-pretrained LLMs, OBIM achieves state-of-the-art or highly competitive performance, notably improving GSM8K, MATH, MMLU, and JP-LMEH benchmarks, and demonstrating robustness to the number of merged models and to merging order. The method is computationally efficient, data-efficient for saliency estimation, and readily combines with other merging techniques, making it practical for real-world multi-task or multilingual model fusion.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities, but their high computational costs pose challenges for customization. Model merging offers a cost-effective alternative, yet existing methods suffer from interference among parameters, leading to performance degradation. In this work, we propose Optimal Brain Iterative Merging (OBIM), a novel method designed to mitigate both intra-model and inter-model interference. OBIM consists of two key components: (1) A saliency measurement mechanism that evaluates parameter importance based on loss changes induced by individual weight alterations, reducing intra-model interference by preserving only high-saliency parameters. (2) A mutually exclusive iterative merging framework, which incrementally integrates models using a binary mask to avoid direct parameter averaging, thereby mitigating inter-model interference. We validate OBIM through experiments on both Supervised Fine-Tuned (SFT) models and post-pretrained checkpoints. The results show that OBIM significantly outperforms existing merging techniques. Overall, OBIM provides an effective and practical solution for enhancing LLM merging.

Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

TL;DR

This work tackles interference during merging of LLMs by introducing OBIM, which combines saliency-based pruning (Optimal Brain Merging) with a mutually exclusive, iterative merging procedure (Iterative Merging). By approximating parameter saliency via a layer-wise MSE proxy and a diagonal Hessian term, OBM preserves high-saliency weights while discarding the rest, and IM uses non-overlapping masks to prevent parameter interference across models. Across SFT and post-pretrained LLMs, OBIM achieves state-of-the-art or highly competitive performance, notably improving GSM8K, MATH, MMLU, and JP-LMEH benchmarks, and demonstrating robustness to the number of merged models and to merging order. The method is computationally efficient, data-efficient for saliency estimation, and readily combines with other merging techniques, making it practical for real-world multi-task or multilingual model fusion.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities, but their high computational costs pose challenges for customization. Model merging offers a cost-effective alternative, yet existing methods suffer from interference among parameters, leading to performance degradation. In this work, we propose Optimal Brain Iterative Merging (OBIM), a novel method designed to mitigate both intra-model and inter-model interference. OBIM consists of two key components: (1) A saliency measurement mechanism that evaluates parameter importance based on loss changes induced by individual weight alterations, reducing intra-model interference by preserving only high-saliency parameters. (2) A mutually exclusive iterative merging framework, which incrementally integrates models using a binary mask to avoid direct parameter averaging, thereby mitigating inter-model interference. We validate OBIM through experiments on both Supervised Fine-Tuned (SFT) models and post-pretrained checkpoints. The results show that OBIM significantly outperforms existing merging techniques. Overall, OBIM provides an effective and practical solution for enhancing LLM merging.

Paper Structure

This paper contains 31 sections, 8 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of inter-model interference. The dotted box highlights cases where TIES fails to resolve interference. Approximately $46\%$ of parameters deviate from the original models due to task vector averaging in the absence of sign conflicts.
  • Figure 2: An overview of the proposed method. The left part depicts the iterative merging process, while the right part details how parameters are selected at each iteration step through the cooperation of parameter saliency and the merged mask.
  • Figure 3: Performance comparison of merging different numbers of models between OBIM and DARE. The left part presents the average performance across three languages, while the right part shows the results for Japanese capability. The green dotted line represents the best performance of models before merging.