Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Mingyang Song; Mao Zheng

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Mingyang Song, Mao Zheng

TL;DR

This survey presents a comprehensive and structured examination of model merging in the LLM era through the FUSE taxonomy, a four-dimensional framework organized along Foundations, Unification Strategies, Cenarios, and Ecosystem, to equip researchers and practitioners with a structured foundation for advancing model merging.

Abstract

Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey presents a comprehensive and structured examination of model merging in the LLM era through the \textbf{FUSE} taxonomy, a four-dimensional framework organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry, mode connectivity, and the linear mode connectivity hypothesis. We then systematically review the algorithmic landscape, spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization approaches. For each method family, we analyze the core formulation, highlight representative works, and discuss practical trade-offs. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, multilingual transfer, and federated learning. Finally, we survey the supporting ecosystem of open-source tools, community platforms, and evaluation benchmarks, and identify key open challenges including theoretical gaps, scalability barriers, and standardization needs. This survey aims to equip researchers and practitioners with a structured foundation for advancing model merging.

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

TL;DR

Abstract

Paper Structure (50 sections, 1 theorem, 16 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 50 sections, 1 theorem, 16 equations, 4 figures, 8 tables, 1 algorithm.

Introduction
Theoretical Foundations of Model Merging
Loss Landscape Geometry and Convexity Properties in Neural Networks
Linear Mode Connectivity Theory and Basin Characteristics
Weight Space Symmetries and Permutation Invariance
Prerequisites and Conditions for Successful Model Merging
Open Theoretical Questions and Research Frontiers
Weight-Space Averaging and Geometric Interpolation Methods
Linear Averaging and Static Model Soups
Importance-Weighted Averaging and Fisher Information
Limitations and failure modes.
Trajectory-Based Averaging and Optimization Dynamics
Limitations.
Geometric Interpolation and Manifold-Aware Methods
Task Vector Arithmetic and Sparsification-Enhanced Methods
...and 35 more sections

Key Result

Proposition 1

Let $\theta_1, \theta_2$ be two models with shared pretrained initialization $\theta_0$, and let $\theta_{\text{avg}} = \frac{1}{2}(\theta_1 + \theta_2)$. Under $L$-smoothness of the loss function $\mathcal{L}$, the performance degradation of the averaged model is bounded by:

Figures (4)

Figure 1: Overview of the model merging pipeline. Task-specific models $\{\theta_1,\dots,\theta_K\}$, obtained by fine-tuning a shared pretrained checkpoint $\theta_0$, are combined via one of several strategies (Sections 3--5) into a single model $\theta_{\mathrm{merged}}$, which is then evaluated on all source tasks. Gray vertical bars represent the many-to-many mapping between model inputs and merging strategies.
Figure 2: The FUSE taxonomy. Foundations (§\ref{['sec:theory']}): why merging works. Unification (§\ref{['sec:weight_avg']}--§\ref{['sec:search']}): how models are combined. Scenarios (§\ref{['sec:applications']}): where merging applies. Ecosystem (§\ref{['sec:ecosystem']}): what supports deployment.
Figure 3: Four fundamental task vector operations. (a) Addition combines multiple task vectors into a single model. (b) Negation subtracts a task vector to remove unwanted behavior. (c) Scaling controls task influence via $\lambda$. (d) Analogy transfers the relationship between two tasks.
Figure 4: Evolution of model merging techniques. The field progressed from foundational loss landscape theory (2018--2020), through weight-space averaging (2022) and interference-aware task vectors (2023), to adaptive/evolutionary optimization and standardized ecosystem tooling (2024--2025). Arrows indicate the conceptual flow and inheritance of ideas across phases.

Theorems & Definitions (1)

Proposition 1: Merging Error Bound DBLP:journals/corr/abs-2406-16300

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

TL;DR

Abstract

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (1)