Model Assembly Learning with Heterogeneous Layer Weight Merging
Yi-Kai Zhang, Jin Wang, Xu-Xiang Zhong, De-Chuan Zhan, Han-Jia Ye
TL;DR
Addressing the challenge of extending pre-trained models without access to target data, the paper proposes Model Assembly Learning (MAL) to merge heterogeneous architectures from a large model zoo into a base model. MAL uses layer-wise parameter merging guided by a generalized permutation, zero-padding, bidirectional alignment, and a bag of laws, with linear mode connectivity serving as a validation signal. The approach relaxes architecture uniformity requirements and provides optimization strategies including LAP-equivalence and alternating optimization for mismatched shapes. Experimental results across 30 architectures and multiple datasets demonstrate effective knowledge transfer, preserving domain performance up to a merging threshold and highlighting practical guidelines for heterogeneous parameter fusion.
Abstract
Model merging acquires general capabilities without extra data or training by combining multiple models' parameters. Previous approaches achieve linear mode connectivity by aligning parameters into the same loss basin using permutation invariance. In this paper, we introduce Model Assembly Learning (MAL), a novel paradigm for model merging that iteratively integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities. Unlike previous works that require identical architectures, MAL allows the merging of heterogeneous architectures and selective parameters across layers. Specifically, the base model can incorporate parameters from different layers of multiple pre-trained models. We systematically investigate the conditions and fundamental settings of heterogeneous parameter merging, addressing all possible mismatches in layer widths between the base and target models. Furthermore, we establish key laws and provide practical guidelines for effectively implementing MAL.
