Table of Contents
Fetching ...

From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Wei Ruan, Tianze Yang, Yifan Zhou, Tianming Liu, Jin Lu

TL;DR

Facing data-inaccessibility, the paper surveys model merging as a data-free path to cross-task generalization. It introduces a taxonomy covering permutation-type, direct merging, magnitude- and activation-based pruning, optimization-based, and LoRA-based approaches, highlighting alignment, interference suppression, and dynamic experts, with examples such as $LAP$, $CTL$, Fisher-weighted schemes, and LoraHub. It analyzes techniques across weight-space and activation-space, discusses the role of Mixture of Experts and task vectors, and identifies key gaps in theory and architecture-aware design. The work emphasizes future directions that combine model compression, dual-space constraints, and clustering of experts to improve scalability and robustness. Overall, it aims to equip newcomers with a clear landscape and to spur innovations in unified, data-efficient merging strategies.

Abstract

Model merging has achieved significant success, with numerous innovative methods proposed to enhance capabilities by combining multiple models. However, challenges persist due to the lack of a unified framework for classification and systematic comparative analysis, leading to inconsistencies in terminologies and categorizations. Meanwhile, as an increasing number of fine-tuned models are publicly available, their original training data often remain inaccessible due to privacy concerns or intellectual property restrictions. This makes traditional multi-task learning based on shared training data impractical. In scenarios where direct access to training data is infeasible, merging model parameters to create a unified model with broad generalization across multiple domains becomes crucial, further underscoring the importance of model merging techniques. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking. This paper addresses these gaps by establishing a new taxonomy of model merging methods, systematically comparing different approaches, and providing an overview of key developments. By offering a structured perspective on this evolving area, we aim to help newcomers quickly grasp the field's landscape and inspire further innovations.

From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

TL;DR

Facing data-inaccessibility, the paper surveys model merging as a data-free path to cross-task generalization. It introduces a taxonomy covering permutation-type, direct merging, magnitude- and activation-based pruning, optimization-based, and LoRA-based approaches, highlighting alignment, interference suppression, and dynamic experts, with examples such as , , Fisher-weighted schemes, and LoraHub. It analyzes techniques across weight-space and activation-space, discusses the role of Mixture of Experts and task vectors, and identifies key gaps in theory and architecture-aware design. The work emphasizes future directions that combine model compression, dual-space constraints, and clustering of experts to improve scalability and robustness. Overall, it aims to equip newcomers with a clear landscape and to spur innovations in unified, data-efficient merging strategies.

Abstract

Model merging has achieved significant success, with numerous innovative methods proposed to enhance capabilities by combining multiple models. However, challenges persist due to the lack of a unified framework for classification and systematic comparative analysis, leading to inconsistencies in terminologies and categorizations. Meanwhile, as an increasing number of fine-tuned models are publicly available, their original training data often remain inaccessible due to privacy concerns or intellectual property restrictions. This makes traditional multi-task learning based on shared training data impractical. In scenarios where direct access to training data is infeasible, merging model parameters to create a unified model with broad generalization across multiple domains becomes crucial, further underscoring the importance of model merging techniques. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking. This paper addresses these gaps by establishing a new taxonomy of model merging methods, systematically comparing different approaches, and providing an overview of key developments. By offering a structured perspective on this evolving area, we aim to help newcomers quickly grasp the field's landscape and inspire further innovations.

Paper Structure

This paper contains 15 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: An overview of common model merging methods and their main distinctions.
  • Figure 2: The cartoon diagram illustrates the principle of task arithmetic.
  • Figure 3: A schematic representation illustrating the drop principles in two model merging methods: TIES-MERGING and Model Breadcrumbs. TIES-MERGING focuses on dropping low-magnitude weights, while Model Breadcrumbs applies predefined thresholds to balance sparsity and parameter retention.