Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

Dengming Zhang; Xiaowen Ma; Zhenliang Ni; Zhenkai Wu; Han Shu; Xin Jiang; Xinghao Chen

Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

Dengming Zhang, Xiaowen Ma, Zhenliang Ni, Zhenkai Wu, Han Shu, Xin Jiang, Xinghao Chen

TL;DR

This work tackles the problem of merging multiple domain-specific experts into a single, scalable model without full retraining. It introduces Expert Merging, which learns per-layer coefficients to explicitly align the merged model’s hidden states and logits with each expert using unlabeled calibration data, and Expert Merging++ which uses importance-guided chunking to allocate more capacity to high-importance layers. The approach combines hidden-state and logit alignment losses, coefficient regularization for stability, and task-weighted trade-offs, achieving state-of-the-art or competitive performance across LLM and MLLM backbones, sometimes surpassing supervised Mixture Training. The results demonstrate robust cross-domain performance with label-free calibration and parameter-efficient merging, offering a practical solution for deploying multi-domain capabilities at scale and guiding future research on inter-layer heterogeneity in model merging.

Abstract

Model merging, which combines multiple domain-specialized experts into a single model, offers a practical path to endow Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) with broad capabilities without the cost of joint training or serving many models. However, training-free methods rely on hand-tuned coefficients, whereas training-based methods primarily align parameters rather than downstream task behavior and typically treat all layers uniformly, ignoring inter-layer heterogeneity. We introduce Expert Merging, a training-light method that learns a small set of layer-wise coefficients using only unlabeled calibration data. The coefficients are optimized to explicitly align the merged model's hidden states and logits with those of the corresponding experts, with a coefficient regularizer for stability and task-weighted losses for controllable trade-offs. To capture inter-layer variation, Expert Merging++ augments this design with importance-guided chunking: a normalized layer-importance metric, derived from learned coefficients, task-vector magnitudes, and parameter counts, allocates more chunk-wise coefficients to high-importance layers while keeping low-importance layers lightweight. The result is a label-free, parameter-efficient, and scalable approach to multi-expert model merging across LLMs and MLLMs. Across MLLM backbones (InternVL and Qwen2-VL) and the LLM backbone (Mistral), our method surpasses strong training-free and training-based merging baselines, with Expert Merging++ delivering further gains and, in some cases, even exceeding supervised Mixture Training. The source code is available at https://github.com/Littleor/ExpertMerging.

Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

TL;DR

Abstract

Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)