CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang, Binhang Qi, Hailong Sun, Wenrui Long, Ruobing Zhao, Xiang Gao
TL;DR
CABS presents a conflict-aware and balanced sparsification framework for merging multiple task-tuned models by applying sequential, non-overlapping pruning (CA) and block-based balanced pruning (BS) to task vectors. Theoretical analysis shows that non-overlapping task vectors become orthogonal, eliminating cross-term interference during merging and enabling independent scaling of task contributions with coefficients λ. Extensive experiments on small and large language models across diverse benchmarks demonstrate that CABS consistently outperforms existing methods and even surpasses an ideal per-task baseline, establishing its robustness and scalability. The practical impact lies in enabling more reliable, high-performance multitask models with limited retraining and efficient merging, applicable across architectures and languages. Limitations include reliance on identical architectures and hyperparameter tuning, with future work aimed at heterogeneous architectures and automated hyperparameter search.
Abstract
Model merging based on task vectors, i.e., the parameter differences between fine-tuned models and a shared base model, provides an efficient way to integrate multiple task-specific models into a multitask model without retraining. Recent works have endeavored to address the conflicts between task vectors, one of the significant challenges faced by model merging, through sparsification; however, two issues significantly limit their performance: high parameter overlap and unbalanced weight distribution. To address these issues, we propose a simple, yet effective framework called CABS (Conflict-Aware and Balanced Sparsification), consisting of Conflict-Aware Sparsification (CA) and Balanced Sparsification (BS). CA can reduce parameter overlap by applying masks during sequential pruning, ensuring that each task vector retains distinct, non-overlapping parameters. BS leverages $n$: $m$ pruning to preserve critical weights while maintaining an even distribution across layers. Our comprehensive experiments demonstrate that CABS outperforms state-of-the-art methods across diverse tasks and model sizes.
