On dataset transferability in medical image classification

Dovile Juodelyte; Enzo Ferrante; Yucheng Lu; Prabhant Singh; Joaquin Vanschoren; Veronika Cheplygina

On dataset transferability in medical image classification

Dovile Juodelyte, Enzo Ferrante, Yucheng Lu, Prabhant Singh, Joaquin Vanschoren, Veronika Cheplygina

TL;DR

This work tackles the misalignment of transferability estimation in medical image classification by introducing a gradient-aware metric that combines feature quality with adaptability through a forward-pass NCA-based representation and a backward-pass gradient ratio. It formalizes the problem of source-model selection without costly fine-tuning, presents two novel evaluation setups tailored to medical imaging, and provides extensive ground-truth benchmarking across MedMNIST and cross-domain transfers. Empirically, the proposed method outperforms feature-only metrics in several targets, highlights the importance of source-data diversity over sheer size, and reveals distinct cross-domain transfer dynamics requiring domain-specific modeling. The study offers a practical framework and data resources to advance transferability estimation for medical imaging and encourages broader exploration beyond ImageNet-centric pre-training.

Abstract

Current transferability estimation methods designed for natural image datasets are often suboptimal in medical image classification. These methods primarily focus on estimating the suitability of pre-trained source model features for a target dataset, which can lead to unrealistic predictions, such as suggesting that the target dataset is the best source for itself. To address this, we propose a novel transferability metric that combines feature quality with gradients to evaluate both the suitability and adaptability of source model features for target tasks. We evaluate our approach in two new scenarios: source dataset transferability for medical image classification and cross-domain transferability. Our results show that our method outperforms existing transferability metrics in both settings. We also provide insight into the factors influencing transfer performance in medical image classification, as well as the dynamics of cross-domain transfer from natural to medical images. Additionally, we provide ground-truth transfer performance benchmarking results to encourage further research into transferability estimation for medical image classification. Our code and experiments are available at https://github.com/DovileDo/transferability-in-medical-imaging.

On dataset transferability in medical image classification

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 6 figures, 5 tables)

This paper contains 26 sections, 8 equations, 6 figures, 5 tables.

Introduction
Related work
Dataset similarity
Task similarity
Embedding-based techniques
Dataset distributions
Transferability metrics
Static features
Modeling changes that occur during fine-tuning
Our approach
Transferability in medical imaging
Method
Problem definition
Gradient-based transferability estimation
Forward-pass
...and 11 more sections

Figures (6)

Figure 1: Illustration of the transferability estimation problem: Given a model zoo, the goal is to predict which model will achieve higher performance after fine-tuning on a specific target task.
Figure 2: Overview of our method. We use Neighborhood Component Analysis (NCA) on feature representations obtained from a forward pass of the target dataset to model fine-tuning dynamics and estimate the source model's feature suitability for the target task. It is then combined with the ratio of gradients from the second and first convolutional layers, obtained from the backward-pass, to estimate the magnitude of feature map updates in these layers during fine-tuning.
Figure 3: t-SNE projections of feature representations $\hat{\boldsymbol{x}}$, for binary Pneumonia classification: (a) before fine-tuning the source model, (b) after fine-tuning, (c) after NCA projection, and (d) after LDA projection. The NCA projection (c) more closely approximates the fine-tuning dynamics, which update the features to achieve better class separability (b), compared to the LDA projection (d).
Figure 4: Transfer performance (AUC) of source datasets (y-axis) evaluated on target test sets. Source datasets are sorted by size, from smallest to largest. The grey dashed line represents the best transfer performance for each target for easier comparison. Overall we do not see a relationship between source data size and transfer performance.
Figure 5: Ground-truth transfer performance $P(\phi_m, \mathcal{T})$ (test AUC) on the x-axis versus transferability score $S(\phi_m, \mathcal{T})$ on the y-axis. The predicted transferability scores are shown for LogME, LEEP, SFDA, PARC, NCTI, $\mathcal{N}$LEEP, and our method (columns) across 11 medical target datasets (rows). The black line represents the regression line, with the 95% confidence interval shaded in grey.
...and 1 more figures

On dataset transferability in medical image classification

TL;DR

Abstract

On dataset transferability in medical image classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)