Table of Contents
Fetching ...

Reinforced Model Merging

Jiaqi Han, Jingwen Ye, Shunyu Liu, Haofei Zhang, Jie Song, Zunlei Feng, Mingli Song

TL;DR

This work tackles the challenge of efficiently merging multiple pre-trained models without gradient access by formulating the task as reinforcement learning. It introduces Reinforced Model Merging (RMM), featuring a merging agent and environment that perform layer-wise actions, optimized with PPO, and a Dynamic Average Reward (DAR) mechanism to dramatically reduce evaluation cost. DAR enables up to ~100× faster searches while achieving state-of-the-art performance on both vision and NLP benchmarks, including ViT and T5-based setups. The approach offers practical edge-device applicability and broad flexibility across merging algorithms and base models, with potential extensions to multi-modal and heterogeneous settings.

Abstract

The success of large language models has garnered widespread attention for model merging techniques, especially training-free methods which combine model capabilities within the parameter space. However, two challenges remain: (1) uniform treatment of all parameters leads to performance degradation; (2) search-based algorithms are often inefficient. In this paper, we present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. These components interact to execute layer-wise merging actions, aiming to search the optimal merging architecture. Notably, RMM operates without any gradient computations on the original models, rendering it feasible for edge devices. Furthermore, by utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times. Extensive experiments demonstrate that RMM achieves state-of-the-art performance across various vision and NLP datasets and effectively overcomes the limitations of the existing baseline methods. Our code is available at https://github.com/WuDiHJQ/Reinforced-Model-Merging.

Reinforced Model Merging

TL;DR

This work tackles the challenge of efficiently merging multiple pre-trained models without gradient access by formulating the task as reinforcement learning. It introduces Reinforced Model Merging (RMM), featuring a merging agent and environment that perform layer-wise actions, optimized with PPO, and a Dynamic Average Reward (DAR) mechanism to dramatically reduce evaluation cost. DAR enables up to ~100× faster searches while achieving state-of-the-art performance on both vision and NLP benchmarks, including ViT and T5-based setups. The approach offers practical edge-device applicability and broad flexibility across merging algorithms and base models, with potential extensions to multi-modal and heterogeneous settings.

Abstract

The success of large language models has garnered widespread attention for model merging techniques, especially training-free methods which combine model capabilities within the parameter space. However, two challenges remain: (1) uniform treatment of all parameters leads to performance degradation; (2) search-based algorithms are often inefficient. In this paper, we present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. These components interact to execute layer-wise merging actions, aiming to search the optimal merging architecture. Notably, RMM operates without any gradient computations on the original models, rendering it feasible for edge devices. Furthermore, by utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times. Extensive experiments demonstrate that RMM achieves state-of-the-art performance across various vision and NLP datasets and effectively overcomes the limitations of the existing baseline methods. Our code is available at https://github.com/WuDiHJQ/Reinforced-Model-Merging.

Paper Structure

This paper contains 20 sections, 10 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Compared with prior methods. (a) Prior training-free merging methods trim and align the task vector (i.e. the difference in parameter values between fine-tuned and pre-trained models) in parameter space. (b) RMM embeds RL into the merging framework, resulting in an automatic, search-based merging paradigm.
  • Figure 2: Overview of the proposed RMM. Our framework incorporates RL into the merging procedure, searching the layer-wise optimal architecture through the interactions between the environment and agent. In each step, the merging map is presented as state to the agent and prompts it to make wise decisions. At the end of an episode, the merged model will be handed over to the environment for evaluation and return a reward to optimize the agent's decisions. Repeatedly iterate until convergence.
  • Figure 3: Episode-Reward variation. We illustrate the variation in reward per episode, indicating the advantages of DAR in enhancing search performance.