Table of Contents
Fetching ...

EMR-Merging: Tuning-Free High-Performance Model Merging

Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang

TL;DR

This paper proposes Elect, Mask&Rescale-Merging (EMR-Merging), a tuning-free model merging method that shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models, NLP models, PEFT models, and multi-modal models.

Abstract

The success of pretrain-finetune paradigm brings about the release of numerous model weights. In this case, merging models finetuned on different tasks to enable a single model with multi-task capabilities is gaining increasing attention for its practicability. Existing model merging methods usually suffer from (1) significant performance degradation or (2) requiring tuning by additional data or training. In this paper, we rethink and analyze the existing model merging paradigm. We discover that using a single model's weights can hardly simulate all the models' performance. To tackle this issue, we propose Elect, Mask & Rescale-Merging (EMR-Merging). We first (a) elect a unified model from all the model weights and then (b) generate extremely lightweight task-specific modulators, including masks and rescalers, to align the direction and magnitude between the unified model and each specific model, respectively. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance. We find that EMR-Merging shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models.

EMR-Merging: Tuning-Free High-Performance Model Merging

TL;DR

This paper proposes Elect, Mask&Rescale-Merging (EMR-Merging), a tuning-free model merging method that shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models, NLP models, PEFT models, and multi-modal models.

Abstract

The success of pretrain-finetune paradigm brings about the release of numerous model weights. In this case, merging models finetuned on different tasks to enable a single model with multi-task capabilities is gaining increasing attention for its practicability. Existing model merging methods usually suffer from (1) significant performance degradation or (2) requiring tuning by additional data or training. In this paper, we rethink and analyze the existing model merging paradigm. We discover that using a single model's weights can hardly simulate all the models' performance. To tackle this issue, we propose Elect, Mask & Rescale-Merging (EMR-Merging). We first (a) elect a unified model from all the model weights and then (b) generate extremely lightweight task-specific modulators, including masks and rescalers, to align the direction and magnitude between the unified model and each specific model, respectively. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance. We find that EMR-Merging shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models.
Paper Structure (33 sections, 13 equations, 9 figures, 15 tables, 1 algorithm)

This paper contains 33 sections, 13 equations, 9 figures, 15 tables, 1 algorithm.

Figures (9)

  • Figure 1: The average accuracy of the multi-task performance of different model merging methods on eight vision tasks. Among all the merging methods, our EMR-Merging is the only one comparable to the performance of MTL and even individual models.
  • Figure 3: Framework overview. In the (a) Merging Procedure, we merge task-specific vectors into a unified task vector and lightweight task-specific modulators to modulate direction and amplitude. During the (b) Inference Procedure, we apply the corresponding mask and rescaler to the unified task vector to obtain a specific task vector. The process of (c)Task-specific Direction and Amplitude Modulation includes obtaining task-specific masks and scalers.
  • Figure 4: Partial (a) t-SNE and (b) Grad-CAM visualization results of EMR-Merging's procedures.
  • Figure 5: Comparison of (a) sign conflicts, (b) L2 distance, and (c) cosine similarity of model weights obtained by different methods and task-specific model weights.
  • Figure 6: Partial visualization results of different merging methods, (a) t-SNE and (b) Grad-CAM.
  • ...and 4 more figures