Model Fusion via Neuron Transplantation
Muhammed Öz, Nicholas Kiefer, Charlotte Debus, Jasmin Hörter, Achim Streit, Markus Götz
TL;DR
The paper tackles the challenge of deploying ensemble neural networks under tight memory and latency constraints. It introduces Neuron Transplantation (NT), a pruning-based fusion method that selectively transplant high-magnitude neurons from ensemble members into a single fused model, with cross-weights learned during fine-tuning. NT achieves ensemble-like performance with memory and time efficiency, often converging faster than optimal-transport-based fusion and surpassing individual models of the same capacity after modest fine-tuning. While NT scales well across diverse architectures and datasets, it exhibits saturation when fusing too many similar models, highlighting the importance of ensemble diversity for maximum gains.
Abstract
Ensemble learning is a widespread technique to improve the prediction performance of neural networks. However, it comes at the price of increased memory and inference time. In this work we propose a novel model fusion technique called \emph{Neuron Transplantation (NT)} in which we fuse an ensemble of models by transplanting important neurons from all ensemble members into the vacant space obtained by pruning insignificant neurons. An initial loss in performance post-transplantation can be quickly recovered via fine-tuning, consistently outperforming individual ensemble members of the same model capacity and architecture. Furthermore, NT enables all the ensemble members to be jointly pruned and jointly trained in a combined model. Comparing it to alignment-based averaging (like Optimal-Transport-fusion), it requires less fine-tuning than the corresponding OT-fused model, the fusion itself is faster and requires less memory, while the resulting model performance is comparable or better. The code is available under the following link: https://github.com/masterbaer/neuron-transplantation.
