Table of Contents
Fetching ...

Model Fusion via Neuron Transplantation

Muhammed Öz, Nicholas Kiefer, Charlotte Debus, Jasmin Hörter, Achim Streit, Markus Götz

TL;DR

The paper tackles the challenge of deploying ensemble neural networks under tight memory and latency constraints. It introduces Neuron Transplantation (NT), a pruning-based fusion method that selectively transplant high-magnitude neurons from ensemble members into a single fused model, with cross-weights learned during fine-tuning. NT achieves ensemble-like performance with memory and time efficiency, often converging faster than optimal-transport-based fusion and surpassing individual models of the same capacity after modest fine-tuning. While NT scales well across diverse architectures and datasets, it exhibits saturation when fusing too many similar models, highlighting the importance of ensemble diversity for maximum gains.

Abstract

Ensemble learning is a widespread technique to improve the prediction performance of neural networks. However, it comes at the price of increased memory and inference time. In this work we propose a novel model fusion technique called \emph{Neuron Transplantation (NT)} in which we fuse an ensemble of models by transplanting important neurons from all ensemble members into the vacant space obtained by pruning insignificant neurons. An initial loss in performance post-transplantation can be quickly recovered via fine-tuning, consistently outperforming individual ensemble members of the same model capacity and architecture. Furthermore, NT enables all the ensemble members to be jointly pruned and jointly trained in a combined model. Comparing it to alignment-based averaging (like Optimal-Transport-fusion), it requires less fine-tuning than the corresponding OT-fused model, the fusion itself is faster and requires less memory, while the resulting model performance is comparable or better. The code is available under the following link: https://github.com/masterbaer/neuron-transplantation.

Model Fusion via Neuron Transplantation

TL;DR

The paper tackles the challenge of deploying ensemble neural networks under tight memory and latency constraints. It introduces Neuron Transplantation (NT), a pruning-based fusion method that selectively transplant high-magnitude neurons from ensemble members into a single fused model, with cross-weights learned during fine-tuning. NT achieves ensemble-like performance with memory and time efficiency, often converging faster than optimal-transport-based fusion and surpassing individual models of the same capacity after modest fine-tuning. While NT scales well across diverse architectures and datasets, it exhibits saturation when fusing too many similar models, highlighting the importance of ensemble diversity for maximum gains.

Abstract

Ensemble learning is a widespread technique to improve the prediction performance of neural networks. However, it comes at the price of increased memory and inference time. In this work we propose a novel model fusion technique called \emph{Neuron Transplantation (NT)} in which we fuse an ensemble of models by transplanting important neurons from all ensemble members into the vacant space obtained by pruning insignificant neurons. An initial loss in performance post-transplantation can be quickly recovered via fine-tuning, consistently outperforming individual ensemble members of the same model capacity and architecture. Furthermore, NT enables all the ensemble members to be jointly pruned and jointly trained in a combined model. Comparing it to alignment-based averaging (like Optimal-Transport-fusion), it requires less fine-tuning than the corresponding OT-fused model, the fusion itself is faster and requires less memory, while the resulting model performance is comparable or better. The code is available under the following link: https://github.com/masterbaer/neuron-transplantation.

Paper Structure

This paper contains 29 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Neuron Transplantation. Low-magnitude neurons are replaced by large-magnitude ones from other models.
  • Figure 2: Pipeline of fusing multiple ensemble members. Multiple models are trained independently, concatenated into one large model, pruned down to the original size and then fine-tuned.
  • Figure 3: Concatenating 2D convolution layers. Channels are stacked, batch normalization and pooling operations are preserved.
  • Figure 4: Left: Transplanting different neuron amounts of one model into another. Without fine-tuning, test accuracy of the fused model drops symmetrically. With 3 epochs of fine-tuning, the fused model surpasses individual model performance peaking at a 50% transplantation rate. Right: Fusing multiple models and pruning to specific sparsity ratios followed by 30 epochs of fine-tuning. Marked with "x" is the sparsity ratio, for which an individual model size is recovered. For more models, this shifts closer to a sparsity of one, where most performance is lost.
  • Figure 5: Mean accuracy plots for NT, OT and vanilla averaging for five different seeds after fusing two models. All methods beat individual accuracy after different amounts of fine-tuning (left) and distillation (right).