In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models
Hu Wang, Ibrahim Almakky, Congbo Ma, Numan Saeed, Mohammad Yaqub
TL;DR
This work tackles robustness in medical imaging classification by introducing InMerge, a single-model, finetuning-based kernel-merging strategy that reduces intra-model kernel redundancy in deeper CNN layers. Kernel similarity is quantified with cosine similarity, $sim(\mathbf{k}_i,\mathbf{k}_j) = \frac{\mathbf{k}_i^\top \mathbf{k}_j}{\|\mathbf{k}_i\| \|\mathbf{k}_j\|}$, and merging is performed via interpolation $\mathbf{K}_i \leftarrow \alpha \mathbf{K}_i + (1-\alpha) \mathbf{K}_j$ when $sim(\mathbf{k}_i,\mathbf{k}_j) > \tau$, with merging occurring stochastically at probability $p$ and excluding the first $L_s$ shallow layers. The method requires no extra inference cost and shows improved AUROC/accuracy across ChestXRay14 and MedMNIST, with ablations revealing the importance of deep-layer merging and hyperparameter choices. The findings suggest a practical regularization mechanism for robust medical imaging models and point toward future extensions to transformer-based architectures.
Abstract
Model merging is an effective strategy to merge multiple models for enhancing model performances, and more efficient than ensemble learning as it will not introduce extra computation into inference. However, limited research explores if the merging process can occur within one model and enhance the model's robustness, which is particularly critical in the medical image domain. In the paper, we are the first to propose in-model merging (InMerge), a novel approach that enhances the model's robustness by selectively merging similar convolutional kernels in the deep layers of a single convolutional neural network (CNN) during the training process for classification. We also analytically reveal important characteristics that affect how in-model merging should be performed, serving as an insightful reference for the community. We demonstrate the feasibility and effectiveness of this technique for different CNN architectures on 4 prevalent datasets. The proposed InMerge-trained model surpasses the typically-trained model by a substantial margin. The code will be made public.
