Training-free Heterogeneous Model Merging

Zhengqi Xu; Han Zheng; Jie Song; Li Sun; Mingli Song

Training-free Heterogeneous Model Merging

Zhengqi Xu, Han Zheng, Jie Song, Li Sun, Mingli Song

TL;DR

The paper tackles training-free merging of structurally heterogeneous models, addressing depth and width differences that hinder traditional homogeneous merging. It introduces depth-heterogeneous merging via segment-wise or layer-wise alignment and a width-heterogeneous merging technique called elastic neuron zipping to project weights onto a common width without retraining. Through extensive experiments on vision (ResNet, VGG) and NLP (BERT-based encoders) tasks, the proposed methods achieve performance comparable to homogeneous merging and outperform naive weight averaging, while also providing insightful analysis of representation alignment via CK A. The work demonstrates practical applicability for integrating task-specific models into a unified framework across domains, with publicly available code to facilitate adoption.

Abstract

Model merging has attracted significant attention as a powerful paradigm for model reuse, facilitating the integration of task-specific models into a singular, versatile framework endowed with multifarious capabilities. Previous studies, predominantly utilizing methods such as Weight Average (WA), have shown that model merging can effectively leverage pretrained models without the need for laborious retraining. However, the inherent heterogeneity among models poses a substantial constraint on its applicability, particularly when confronted with discrepancies in model architectures. To overcome this challenge, we propose an innovative model merging framework designed for heterogeneous models, encompassing both depth and width heterogeneity. To address depth heterogeneity, we introduce a layer alignment strategy that harmonizes model layers by segmenting deeper models, treating consecutive layers with similar representations as a cohesive segment, thus enabling the seamless merging of models with differing layer depths. For width heterogeneity, we propose a novel elastic neuron zipping algorithm that projects the weights from models of varying widths onto a common dimensional space, eliminating the need for identical widths. Extensive experiments validate the efficacy of these proposed methods, demonstrating that the merging of structurally heterogeneous models can achieve performance levels comparable to those of homogeneous merging, across both vision and NLP tasks. Our code is publicly available at https://github.com/zju-vipa/training_free_heterogeneous_model_merging.

Training-free Heterogeneous Model Merging

TL;DR

Abstract

Training-free Heterogeneous Model Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (3)