FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis
Santosh Sanjeev, Nuren Zhaksylyk, Ibrahim Almakky, Anees Ur Rehman Hashmi, Mohammad Areeb Qazi, Mohammad Yaqub
TL;DR
The paper tackles the challenge of transferring pre-trained models to medical imaging in the face of heterogeneous data and distribution shifts, where traditional model soups underperform due to rough error landscapes. It introduces Fast Geometric Generation ($FGG$), which uses a cyclical learning-rate schedule to generate diverse weight-space models with minimal hyperparameter search, and Hierarchical Souping ($HS$), a multi-level model averaging scheme tailored to medical data. Together, FGG and HS yield significant gains over standard model soups (e.g., ~6% on HAM10000 and CheXpert) and improve robustness on out-of-distribution data, while reducing computational cost compared to grid-search ensembles. The approach demonstrates strong performance across natural and medical imaging datasets using ResNet50 and DeiT-B backbones and offers practical benefits for transfer learning in data-scarce clinical contexts, with avenues for smoothing extremely rough loss landscapes in future work.
Abstract
The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets. However, applying these methods to the medical imaging domain faces challenges and results in suboptimal performance. This is primarily due to differences in error surface characteristics that stem from data complexities such as heterogeneity, domain shift, class imbalance, and distributional shifts between training and testing phases. To address this issue, we propose a hierarchical merging approach that involves local and global aggregation of models at various levels based on models' hyperparameter configurations. Furthermore, to alleviate the need for training a large number of models in the hyperparameter search, we introduce a computationally efficient method using a cyclical learning rate scheduler to produce multiple models for aggregation in the weight space. Our method demonstrates significant improvements over the model souping approach across multiple datasets (around 6% gain in HAM10000 and CheXpert datasets) while maintaining low computational costs for model generation and selection. Moreover, we achieve better results on OOD datasets than model soups. The code is available at https://github.com/BioMedIA-MBZUAI/FissionFusion.
