Table of Contents
Fetching ...

MedMerge: Merging Models for Effective Transfer Learning to Medical Imaging Tasks

Ibrahim Almakky, Santosh Sanjeev, Anees Ur Rehman Hashmi, Mohammad Areeb Qazi, Hu Wang, Mohammad Yaqub

TL;DR

The authors address transfer learning in medical imaging by enabling merging of models pre-trained on different tasks. MedMerge learns kernel-level weights to blend two backbones from distinct datasets into a unified model for a new target dataset, with one extractor anchored during training. Empirical results across HAM10K, ISIC-2019, EyePACS, and APTOS show MedMerge achieving higher macro F1 and accuracy than standard fine-tuning, LP-FT, and other merging methods, with notable gains and robust generalization. The work demonstrates layer-wise feature transfer patterns and offers a path toward broader cross-task merging, including future exploration of Transformer architectures and multi-initialization merging.

Abstract

Transfer learning has become a powerful tool to initialize deep learning models to achieve faster convergence and higher performance. This is especially useful in the medical imaging analysis domain, where data scarcity limits possible performance gains for deep learning models. Some advancements have been made in boosting the transfer learning performance gain by merging models starting from the same initialization. However, in the medical imaging analysis domain, there is an opportunity to merge models starting from different initializations, thus combining the features learned from different tasks. In this work, we propose MedMerge, a method whereby the weights of different models can be merged, and their features can be effectively utilized to boost performance on a new task. With MedMerge, we learn kernel-level weights that can later be used to merge the models into a single model, even when starting from different initializations. Testing on various medical imaging analysis tasks, we show that our merged model can achieve significant performance gains, with up to 7% improvement on the F1 score. The code implementation of this work is available at github.com/BioMedIA-MBZUAI/MedMerge.

MedMerge: Merging Models for Effective Transfer Learning to Medical Imaging Tasks

TL;DR

The authors address transfer learning in medical imaging by enabling merging of models pre-trained on different tasks. MedMerge learns kernel-level weights to blend two backbones from distinct datasets into a unified model for a new target dataset, with one extractor anchored during training. Empirical results across HAM10K, ISIC-2019, EyePACS, and APTOS show MedMerge achieving higher macro F1 and accuracy than standard fine-tuning, LP-FT, and other merging methods, with notable gains and robust generalization. The work demonstrates layer-wise feature transfer patterns and offers a path toward broader cross-task merging, including future exploration of Transformer architectures and multi-initialization merging.

Abstract

Transfer learning has become a powerful tool to initialize deep learning models to achieve faster convergence and higher performance. This is especially useful in the medical imaging analysis domain, where data scarcity limits possible performance gains for deep learning models. Some advancements have been made in boosting the transfer learning performance gain by merging models starting from the same initialization. However, in the medical imaging analysis domain, there is an opportunity to merge models starting from different initializations, thus combining the features learned from different tasks. In this work, we propose MedMerge, a method whereby the weights of different models can be merged, and their features can be effectively utilized to boost performance on a new task. With MedMerge, we learn kernel-level weights that can later be used to merge the models into a single model, even when starting from different initializations. Testing on various medical imaging analysis tasks, we show that our merged model can achieve significant performance gains, with up to 7% improvement on the F1 score. The code implementation of this work is available at github.com/BioMedIA-MBZUAI/MedMerge.
Paper Structure (5 sections, 3 figures, 2 tables)

This paper contains 5 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: An outline of the proposed MedMege method where the kernels from two pre-trained models are effectively combined towards a new task.
  • Figure 2: The loss surface plots showing the training and testing error on a two-dimensional slice of the error landscapes for MedMerge compared with Permuation entezari2021role and Zipit stoica2023zipit. This is done for the ResNet-18 model when trained on the ISIC-2019 dataset and starting from the same ImageNet and HAM10K pre-trained models (represented by the square and circle). The diamond represents the merged model.
  • Figure 3: Model depth-wise heatmaps showing (a) The mean of learned kernel-based weights at every convolutional layer of the DenseNet-121 backbone. (b) A depth-wise heatmap showing the mean of learned kernel-based weights at every convolutional layer of the DenseNet-121 backbone when merging between one of the source initializations and zeros towards APTOS.