From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches
Wei Ruan, Tianze Yang, Yifan Zhou, Tianming Liu, Jin Lu
TL;DR
Facing data-inaccessibility, the paper surveys model merging as a data-free path to cross-task generalization. It introduces a taxonomy covering permutation-type, direct merging, magnitude- and activation-based pruning, optimization-based, and LoRA-based approaches, highlighting alignment, interference suppression, and dynamic experts, with examples such as $LAP$, $CTL$, Fisher-weighted schemes, and LoraHub. It analyzes techniques across weight-space and activation-space, discusses the role of Mixture of Experts and task vectors, and identifies key gaps in theory and architecture-aware design. The work emphasizes future directions that combine model compression, dual-space constraints, and clustering of experts to improve scalability and robustness. Overall, it aims to equip newcomers with a clear landscape and to spur innovations in unified, data-efficient merging strategies.
Abstract
Model merging has achieved significant success, with numerous innovative methods proposed to enhance capabilities by combining multiple models. However, challenges persist due to the lack of a unified framework for classification and systematic comparative analysis, leading to inconsistencies in terminologies and categorizations. Meanwhile, as an increasing number of fine-tuned models are publicly available, their original training data often remain inaccessible due to privacy concerns or intellectual property restrictions. This makes traditional multi-task learning based on shared training data impractical. In scenarios where direct access to training data is infeasible, merging model parameters to create a unified model with broad generalization across multiple domains becomes crucial, further underscoring the importance of model merging techniques. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking. This paper addresses these gaps by establishing a new taxonomy of model merging methods, systematically comparing different approaches, and providing an overview of key developments. By offering a structured perspective on this evolving area, we aim to help newcomers quickly grasp the field's landscape and inspire further innovations.
