Supervised Transfer Learning at Scale for Medical Imaging
Basil Mustafa, Aaron Loh, Jan Freyberg, Patricia MacWilliams, Megan Wilson, Scott Mayer McKinney, Marcin Sieniek, Jim Winkens, Yuan Liu, Peggy Bui, Shruthi Prabhakara, Umesh Telang, Alan Karthikesalingam, Neil Houlsby, Vivek Natarajan
TL;DR
This work investigates whether large-scale supervised pre-training on natural images can effectively transfer to medical imaging, despite substantial domain differences. By evaluating Big Transfer (BiT) models pre-trained on ImageNet, ImageNet-21k, and JFT-300M across Mammography, CheXpert, and Dermatology, the study analyzes accuracy, distribution-shift robustness, data efficiency, fairness, calibration, and model understanding. The key finding is that with sufficient scale in both architecture and pre-training data, cross-domain transfer yields improved performance, better generalization under distribution shifts, and data-efficient learning without harming fairness or uncertainty estimation; deeper analyses suggest enhanced reuse of high-level features. These results support practical adoption of large-scale natural-image pretraining for medical-imaging tasks and highlight the continued relevance of scaling in transfer learning, even when domain gaps exist.
Abstract
Transfer learning is a standard technique to improve performance on tasks with limited data. However, for medical imaging, the value of transfer learning is less clear. This is likely due to the large domain mismatch between the usual natural-image pre-training (e.g. ImageNet) and medical images. However, recent advances in transfer learning have shown substantial improvements from scale. We investigate whether modern methods can change the fortune of transfer learning for medical imaging. For this, we study the class of large-scale pre-trained networks presented by Kolesnikov et al. on three diverse imaging tasks: chest radiography, mammography, and dermatology. We study both transfer performance and critical properties for the deployment in the medical domain, including: out-of-distribution generalization, data-efficiency, sub-group fairness, and uncertainty estimation. Interestingly, we find that for some of these properties transfer from natural to medical images is indeed extremely effective, but only when performed at sufficient scale.
