Sparsity-Aware Evolution for Model Merging
Huan Zhang, Yanjian Zhang, Guillaume Wisniewski, Nadi Tomeh, Bang Liu
TL;DR
The paper addresses destructive interference in multi-parent model merging by introducing sparsity-aware evolution (SAE), which jointly optimizes task performance and structured sparsity within an archive-based, layer-wise merging framework. It defines a merged-space $\\Theta_{\\mathcal{M}}$ and a layer-wise mixing rule with $\\lambda_r^{(l)} = \frac{s_A + \omega_A^{(l)}}{(s_A + \omega_A^{(l)}) + (s_B + \omega_B^{(l)})}$, and augments the fitness with sparsity signals to drive a dense-sparse-dense search that promotes modularity. Key contributions include sparsity-induced attraction, annealing sparsification via cyclic schedules, and empirical validation on large-scale LLM benchmarks where SAE outperforms strong baselines like PSO and yields smoother loss landscapes. The approach offers a scalable, data-free path to fuse diverse competencies while constraining interference, with practical impact for robust, multi-task LLM fusion and potential applicability beyond homologous architectures.
Abstract
We propose a sparsity-aware evolutionary (SAE) framework for model merging that involves iterative pruning-merging cycles to act as a novel mutation operator. We incorporate the sparsity constraints into the score function, which steers the evolutionary process to favor more sparse models, in addition to other conventional performance scores. Interestingly, the by-product of \textit{competition} for sparsity introduces an extra local \textit{attraction} and interplay into the evolutionary process: if one competitor has more zero elements, the other competitor's non-zero elements will occupy those positions, even though the less sparse competitor loses to the more sparse competitor in other positions. The proposed pipeline is evaluated on a variety of large-scale LLM benchmarks. Experiments demonstrate that our approach can improve model merging reliability across multiple benchmarks, and is easy to incorporate due to its simplicity and being orthogonal to most existing approaches.
