Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging
Zhixiang Wang, Zhenyu Mao, Yixuan Qiao, Yunfang Wu, Biye Li
TL;DR
This work tackles interference during merging of LLMs by introducing OBIM, which combines saliency-based pruning (Optimal Brain Merging) with a mutually exclusive, iterative merging procedure (Iterative Merging). By approximating parameter saliency via a layer-wise MSE proxy and a diagonal Hessian term, OBM preserves high-saliency weights while discarding the rest, and IM uses non-overlapping masks to prevent parameter interference across models. Across SFT and post-pretrained LLMs, OBIM achieves state-of-the-art or highly competitive performance, notably improving GSM8K, MATH, MMLU, and JP-LMEH benchmarks, and demonstrating robustness to the number of merged models and to merging order. The method is computationally efficient, data-efficient for saliency estimation, and readily combines with other merging techniques, making it practical for real-world multi-task or multilingual model fusion.
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities, but their high computational costs pose challenges for customization. Model merging offers a cost-effective alternative, yet existing methods suffer from interference among parameters, leading to performance degradation. In this work, we propose Optimal Brain Iterative Merging (OBIM), a novel method designed to mitigate both intra-model and inter-model interference. OBIM consists of two key components: (1) A saliency measurement mechanism that evaluates parameter importance based on loss changes induced by individual weight alterations, reducing intra-model interference by preserving only high-saliency parameters. (2) A mutually exclusive iterative merging framework, which incrementally integrates models using a binary mask to avoid direct parameter averaging, thereby mitigating inter-model interference. We validate OBIM through experiments on both Supervised Fine-Tuned (SFT) models and post-pretrained checkpoints. The results show that OBIM significantly outperforms existing merging techniques. Overall, OBIM provides an effective and practical solution for enhancing LLM merging.
