Table of Contents
Fetching ...

Deep Model Fusion: A Survey

Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen

TL;DR

This survey addresses deep model fusion, a framework to merge multiple deep networks to improve accuracy, robustness, and efficiency. It organizes approaches into four families: mode connectivity, alignment, weight averaging, and ensemble learning, and covers methods, theory, and applications including Federated Learning, fine-tuning, distillation, and foundation-model fusion. The work highlights practical benefits, limitations, and pivotal challenges such as computational cost, heterogeneity handling, and scalability. It also outlines future directions, including scalable alignment, subspace fusion, and adaptive fusion for large-scale, heterogeneous systems.

Abstract

Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.

Deep Model Fusion: A Survey

TL;DR

This survey addresses deep model fusion, a framework to merge multiple deep networks to improve accuracy, robustness, and efficiency. It organizes approaches into four families: mode connectivity, alignment, weight averaging, and ensemble learning, and covers methods, theory, and applications including Federated Learning, fine-tuning, distillation, and foundation-model fusion. The work highlights practical benefits, limitations, and pivotal challenges such as computational cost, heterogeneity handling, and scalability. It also outlines future directions, including scalable alignment, subspace fusion, and adaptive fusion for large-scale, heterogeneous systems.

Abstract

Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.
Paper Structure (28 sections, 45 equations, 8 figures, 9 tables)

This paper contains 28 sections, 45 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Schematic diagram of the overall model fusion process, as well as classification and connection of various classification methods.
  • Figure 2: Mode connectivity schematic diagram in two-dimensional loss landscape and other dimensional subspace. Left: Linear interpolation of the minima in the two basins results in high-loss barriersdraxler2018essentially. The lower two optimums follow a path of near constant low loss (e.g., Bezier curve, Polygonal chain, etc.)garipov2018loss. $\pi(W_{2})$ is the equivalent model of $W_2$ by permutation symmetry, which is located in the same basin as $W_1$. Re-Basin merges models by delivering solutions to individual basins ainsworth2022git. Right: Low loss paths connect multiple minima in subspace(e.g., a low-loss manifold composed of $d$-dim wedges fort2019large), etc.).
  • Figure 3: Left: general alignment process. Model $A$ is transformed into model $A_{p}$ by reference to model $B$. Then the linear combination of $A_{p}$ and $B$ produces C. Right: adjust the parameter vectors of the two neurons $\vartheta_{m}$,$\vartheta _{n}$ in different hidden layers are close to the replacement point. At the replacement point, brea2019weight, $\vartheta_{m}^{\prime}=\vartheta _{n}^{\prime}$, and the two neurons compute the same function, which means that two neurons can be exchanged.
  • Figure 4: Comparison of sampling and learning rate schedule of different SWA related methods. (a) SWA: constant learning rates. (b)SWA: cyclical learning rates $\textbf{c}$. (c)SWAD: sample densely. (d)HWA: leverages both online and offline WA, which sampled at different synchronization cycles with a slide window of length $h$, i.e. $\overline{\overline{w_{i}}}=\frac{\sum_{t=i-h+1}^{i} \overline{w_{t}}}{h}$.
  • Figure 5: The flow chart of Task Arithmetic and LoRA Hubhuang2023lorahub in multi-task scenarios.
  • ...and 3 more figures