Revisiting Weight Averaging for Model Merging
Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong
TL;DR
This work revisits weight averaging for model merging by reframing it as task arithmetic centered at the weight average, revealing that centering combined with a low-rank approximation of task vectors dramatically reduces inter-task interference. The authors introduce CART (Centered Arithmetic with Rank-reduced Task vectors), a training-free merging method that uses top-k singular vectors of centered task differences, achieving robust gains across vision and NLP benchmarks and various backbone sizes. They provide theoretical and empirical evidence linking reduced row-space interference to improved merging, and show that an optimal rank around 8% consistently yields strong performance. CART can be integrated with existing task-arithmetic approaches and extended to test-time adaptation or model compression, offering a practical, scalable solution for multi-task merging without additional training data.
Abstract
Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that our approach is applicable to natural language processing tasks with competitive performance.
