Table of Contents
Fetching ...

Supervised Transfer Learning at Scale for Medical Imaging

Basil Mustafa, Aaron Loh, Jan Freyberg, Patricia MacWilliams, Megan Wilson, Scott Mayer McKinney, Marcin Sieniek, Jim Winkens, Yuan Liu, Peggy Bui, Shruthi Prabhakara, Umesh Telang, Alan Karthikesalingam, Neil Houlsby, Vivek Natarajan

TL;DR

This work investigates whether large-scale supervised pre-training on natural images can effectively transfer to medical imaging, despite substantial domain differences. By evaluating Big Transfer (BiT) models pre-trained on ImageNet, ImageNet-21k, and JFT-300M across Mammography, CheXpert, and Dermatology, the study analyzes accuracy, distribution-shift robustness, data efficiency, fairness, calibration, and model understanding. The key finding is that with sufficient scale in both architecture and pre-training data, cross-domain transfer yields improved performance, better generalization under distribution shifts, and data-efficient learning without harming fairness or uncertainty estimation; deeper analyses suggest enhanced reuse of high-level features. These results support practical adoption of large-scale natural-image pretraining for medical-imaging tasks and highlight the continued relevance of scaling in transfer learning, even when domain gaps exist.

Abstract

Transfer learning is a standard technique to improve performance on tasks with limited data. However, for medical imaging, the value of transfer learning is less clear. This is likely due to the large domain mismatch between the usual natural-image pre-training (e.g. ImageNet) and medical images. However, recent advances in transfer learning have shown substantial improvements from scale. We investigate whether modern methods can change the fortune of transfer learning for medical imaging. For this, we study the class of large-scale pre-trained networks presented by Kolesnikov et al. on three diverse imaging tasks: chest radiography, mammography, and dermatology. We study both transfer performance and critical properties for the deployment in the medical domain, including: out-of-distribution generalization, data-efficiency, sub-group fairness, and uncertainty estimation. Interestingly, we find that for some of these properties transfer from natural to medical images is indeed extremely effective, but only when performed at sufficient scale.

Supervised Transfer Learning at Scale for Medical Imaging

TL;DR

This work investigates whether large-scale supervised pre-training on natural images can effectively transfer to medical imaging, despite substantial domain differences. By evaluating Big Transfer (BiT) models pre-trained on ImageNet, ImageNet-21k, and JFT-300M across Mammography, CheXpert, and Dermatology, the study analyzes accuracy, distribution-shift robustness, data efficiency, fairness, calibration, and model understanding. The key finding is that with sufficient scale in both architecture and pre-training data, cross-domain transfer yields improved performance, better generalization under distribution shifts, and data-efficient learning without harming fairness or uncertainty estimation; deeper analyses suggest enhanced reuse of high-level features. These results support practical adoption of large-scale natural-image pretraining for medical-imaging tasks and highlight the continued relevance of scaling in transfer learning, even when domain gaps exist.

Abstract

Transfer learning is a standard technique to improve performance on tasks with limited data. However, for medical imaging, the value of transfer learning is less clear. This is likely due to the large domain mismatch between the usual natural-image pre-training (e.g. ImageNet) and medical images. However, recent advances in transfer learning have shown substantial improvements from scale. We investigate whether modern methods can change the fortune of transfer learning for medical imaging. For this, we study the class of large-scale pre-trained networks presented by Kolesnikov et al. on three diverse imaging tasks: chest radiography, mammography, and dermatology. We study both transfer performance and critical properties for the deployment in the medical domain, including: out-of-distribution generalization, data-efficiency, sub-group fairness, and uncertainty estimation. Interestingly, we find that for some of these properties transfer from natural to medical images is indeed extremely effective, but only when performed at sufficient scale.

Paper Structure

This paper contains 55 sections, 21 figures, 2 tables.

Figures (21)

  • Figure 1: Transfer learning is well established for natural image tasks, and ImageNet is frequently used for pre-training and/or evaluation. Further, like-to-like transfer within the medical domain has been shown to work heker2020jointliang2020transferpmlr-v102-geyer19achen2019med3d. However, the effectiveness of transfer from natural image datasets to medical imaging is debated raghu2019transfusion. We study this regime in the context of modern transfer methods to better understand the state of the field in this important, yet challenging, domain.
  • Figure 2: Transfer learning performances on the held out test set for the Mammography, CheXpert, and Dermatology tasks. On all the tasks, BiT models outperform a R50x1 baseline; generally there are benefits to scaling pre-training as well as the architecture size.
  • Figure 3: Number of epochs, relative to baseline, required to converge. Larger pre-trained models reduce the number of steps required to converge.
  • Figure 4: Relative performance compared to the baseline model at various resolutions. For Mammography and Dermatology, larger models tend to yield more improvements at larger resolution---note the baseline also improves with resolution (not visible in the figure). For CheXpert, within the range tried, performance was independent of resolution.
  • Figure 5: Generalization of models to other datasets in the same domain. Increasing pre-training scale and architecture scale generally improves robustness to distribution shift on all the tasks considered.
  • ...and 16 more figures