Table of Contents
Fetching ...

FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

Santosh Sanjeev, Nuren Zhaksylyk, Ibrahim Almakky, Anees Ur Rehman Hashmi, Mohammad Areeb Qazi, Mohammad Yaqub

TL;DR

The paper tackles the challenge of transferring pre-trained models to medical imaging in the face of heterogeneous data and distribution shifts, where traditional model soups underperform due to rough error landscapes. It introduces Fast Geometric Generation ($FGG$), which uses a cyclical learning-rate schedule to generate diverse weight-space models with minimal hyperparameter search, and Hierarchical Souping ($HS$), a multi-level model averaging scheme tailored to medical data. Together, FGG and HS yield significant gains over standard model soups (e.g., ~6% on HAM10000 and CheXpert) and improve robustness on out-of-distribution data, while reducing computational cost compared to grid-search ensembles. The approach demonstrates strong performance across natural and medical imaging datasets using ResNet50 and DeiT-B backbones and offers practical benefits for transfer learning in data-scarce clinical contexts, with avenues for smoothing extremely rough loss landscapes in future work.

Abstract

The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets. However, applying these methods to the medical imaging domain faces challenges and results in suboptimal performance. This is primarily due to differences in error surface characteristics that stem from data complexities such as heterogeneity, domain shift, class imbalance, and distributional shifts between training and testing phases. To address this issue, we propose a hierarchical merging approach that involves local and global aggregation of models at various levels based on models' hyperparameter configurations. Furthermore, to alleviate the need for training a large number of models in the hyperparameter search, we introduce a computationally efficient method using a cyclical learning rate scheduler to produce multiple models for aggregation in the weight space. Our method demonstrates significant improvements over the model souping approach across multiple datasets (around 6% gain in HAM10000 and CheXpert datasets) while maintaining low computational costs for model generation and selection. Moreover, we achieve better results on OOD datasets than model soups. The code is available at https://github.com/BioMedIA-MBZUAI/FissionFusion.

FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

TL;DR

The paper tackles the challenge of transferring pre-trained models to medical imaging in the face of heterogeneous data and distribution shifts, where traditional model soups underperform due to rough error landscapes. It introduces Fast Geometric Generation (), which uses a cyclical learning-rate schedule to generate diverse weight-space models with minimal hyperparameter search, and Hierarchical Souping (), a multi-level model averaging scheme tailored to medical data. Together, FGG and HS yield significant gains over standard model soups (e.g., ~6% on HAM10000 and CheXpert) and improve robustness on out-of-distribution data, while reducing computational cost compared to grid-search ensembles. The approach demonstrates strong performance across natural and medical imaging datasets using ResNet50 and DeiT-B backbones and offers practical benefits for transfer learning in data-scarce clinical contexts, with avenues for smoothing extremely rough loss landscapes in future work.

Abstract

The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets. However, applying these methods to the medical imaging domain faces challenges and results in suboptimal performance. This is primarily due to differences in error surface characteristics that stem from data complexities such as heterogeneity, domain shift, class imbalance, and distributional shifts between training and testing phases. To address this issue, we propose a hierarchical merging approach that involves local and global aggregation of models at various levels based on models' hyperparameter configurations. Furthermore, to alleviate the need for training a large number of models in the hyperparameter search, we introduce a computationally efficient method using a cyclical learning rate scheduler to produce multiple models for aggregation in the weight space. Our method demonstrates significant improvements over the model souping approach across multiple datasets (around 6% gain in HAM10000 and CheXpert datasets) while maintaining low computational costs for model generation and selection. Moreover, we achieve better results on OOD datasets than model soups. The code is available at https://github.com/BioMedIA-MBZUAI/FissionFusion.
Paper Structure (7 sections, 1 equation, 4 figures, 2 tables)

This paper contains 7 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: This figure illustrates the validation error on a two-dimensional slice of the error landscapes for various natural and medical domain datasets following an approach outlined in garipov2018loss. (a) CIFAR-10 krizhevsky2009learning (b) CIFAR-100 krizhevsky2009learning (c) FGVC-Aircrafts maji13fine-grained (d) RSNA Pneumonia rsna (e) APTOS aptos2019-blindness-detection (f) HAM10000 DVN/DBW86T_2018. We employ the 3 best-performing models from the validation set, with the best model serving as the reference (origin).
  • Figure 1: Hyperparameter Analysis using Linear Mode Connectivity (LMC) ($\theta = \lambda \cdot \theta_{A} + (1 - \lambda) \cdot \theta_{B}$), where $\theta_{A}$ and $\theta_{B}$ differ only in one hyperparameter. (a) and (d) LMC between models varying only in seed. (b) and (e) LMC between models varying only in augmentation. (c) and (f) LMC between models varying only in learning rate.
  • Figure 2: An illustration of (a) Loss landscape of fine-tuned models (b) Fast Geometric Generation(FGG) approach using cyclical learning rate scheduler (c) FGG and the Hierarchical Souping (HS) approach.
  • Figure 3: OOD analysis for different architectures on various datasets (a) CIFAR10 v/s CIFAR10.1 - ResNet50 (b) CheXpert v/s MIMIC - ResNet50 (c) APTOS v/s (EyePacs, Messidor, Messidorv2) - ResNet50 (d) CIFAR10 v/s CIFAR10.1 - DeiT-B (e) CheXpert v/s MIMIC - DeiT-B (f) APTOS v/s (EyePacs, Messidor, Messidorv2) - DeiT-B. We do not plot the results of Uniform Soups as it performs poorly.