Table of Contents
Fetching ...

Robust and Efficient Medical Imaging with Self-Supervision

Shekoofeh Azizi, Laura Culp, Jan Freyberg, Basil Mustafa, Sebastien Baur, Simon Kornblith, Ting Chen, Patricia MacWilliams, S. Sara Mahdavi, Ellery Wulczyn, Boris Babenko, Megan Wilson, Aaron Loh, Po-Hsuan Cameron Chen, Yuan Liu, Pinal Bavishi, Scott Mayer McKinney, Jim Winkens, Abhijit Guha Roy, Zach Beaver, Fiona Ryan, Justin Krogue, Mozziyar Etemadi, Umesh Telang, Yun Liu, Lily Peng, Greg S. Corrado, Dale R. Webster, David Fleet, Geoffrey Hinton, Neil Houlsby, Alan Karthikesalingam, Mohammad Norouzi, Vivek Natarajan

TL;DR

REMEDIS tackles the persistent challenge of out-of-distribution generalization in medical imaging by unifying large-scale supervised pretraining with intermediate self-supervised contrastive learning on unlabeled medical data, followed by task-specific fine-tuning with limited labels. Across six clinically diverse tasks and 15 evaluation sets, REMEDIS delivers stronger in- and out-of-distribution performance while dramatically reducing the amount of labeled data required to reach clinical utility. The approach yields meaningful reductions in annotation costs and clinician-hours, enabling faster and more scalable deployment of medical imaging AI. These results highlight the practical potential of combining BiT-style supervised pretraining with contrastive self-supervision to build broadly transferable, data-efficient medical AI systems across modalities.

Abstract

Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific data [1]. However, this quickly becomes impractical as medical data is time-consuming to acquire and expensive to annotate [2]. Thus, the problem of "data-efficient generalization" presents an ongoing difficulty for Medical AI development. Although progress in representation learning shows promise, their benefits have not been rigorously studied, specifically for out-of-distribution settings. To meet these challenges, we present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. REMEDIS uses a generic combination of large-scale supervised transfer learning with self-supervised learning and requires little task-specific customization. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data. REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strong supervised baseline. More importantly, our strategy leads to strong data-efficient generalization of medical imaging AI, matching strong supervised baselines using between 1% to 33% of retraining data across tasks. These results suggest that REMEDIS can significantly accelerate the life-cycle of medical imaging AI development thereby presenting an important step forward for medical imaging AI to deliver broad impact.

Robust and Efficient Medical Imaging with Self-Supervision

TL;DR

REMEDIS tackles the persistent challenge of out-of-distribution generalization in medical imaging by unifying large-scale supervised pretraining with intermediate self-supervised contrastive learning on unlabeled medical data, followed by task-specific fine-tuning with limited labels. Across six clinically diverse tasks and 15 evaluation sets, REMEDIS delivers stronger in- and out-of-distribution performance while dramatically reducing the amount of labeled data required to reach clinical utility. The approach yields meaningful reductions in annotation costs and clinician-hours, enabling faster and more scalable deployment of medical imaging AI. These results highlight the practical potential of combining BiT-style supervised pretraining with contrastive self-supervision to build broadly transferable, data-efficient medical AI systems across modalities.

Abstract

Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific data [1]. However, this quickly becomes impractical as medical data is time-consuming to acquire and expensive to annotate [2]. Thus, the problem of "data-efficient generalization" presents an ongoing difficulty for Medical AI development. Although progress in representation learning shows promise, their benefits have not been rigorously studied, specifically for out-of-distribution settings. To meet these challenges, we present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. REMEDIS uses a generic combination of large-scale supervised transfer learning with self-supervised learning and requires little task-specific customization. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data. REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strong supervised baseline. More importantly, our strategy leads to strong data-efficient generalization of medical imaging AI, matching strong supervised baselines using between 1% to 33% of retraining data across tasks. These results suggest that REMEDIS can significantly accelerate the life-cycle of medical imaging AI development thereby presenting an important step forward for medical imaging AI to deliver broad impact.
Paper Structure (66 sections, 1 equation, 19 figures, 28 tables)

This paper contains 66 sections, 1 equation, 19 figures, 28 tables.

Figures (19)

  • Figure 1: Overview of our proposed approach for developing robust and efficient medical imaging AI. REMEDIS starts with representations initialized using large-scale natural image pretraining following the Big Transfer (BiT) method kolesnikov2019big. We then adapt the model to the medical domain using intermediate contrastive self-supervised learning without using any labeled medical data. Finally, we fine-tune the model to specific downstream medical imaging AI tasks. We evaluate the AI model both in an in-distribution (ID) setting and in an out-of-distribution (OOD) setting to establish the data-efficient generalization performance of the model.
  • Figure 2: Overview of clinical settings for evaluating data-efficient generalization of medical imaging AI. We evaluate our self-supervision based representation learning method REMEDIS as well as baseline AI models on five different domains, containing six tasks and encounter a wide and complex variety of distribution shifts in these clinical settings as detailed above.
  • Figure 3: Data-efficient generalization results. Overview of the results demonstrating overall performance and data-efficient generalization of our proposed self-supervised learning method, REMEDIS as well as the strong supervised baseline pretrained on JFT-300M for the dermatology condition classification ($T_1$), diabetic macular edema classification ($T_2$), chest X-ray condition classification ($T_3$), pathology metastases detection ($T_4$), pathology colorectal survival prediction ($T_5$), and mammography classification task ($T_6$). We observed significantly improved out-of-distribution generalization and significant reduction in need for labeled medical data when using our proposed approach. 95% confidence intervals were calculated by running each label fraction and experiment up to ten times and intervals are shown using the shaded area and error bars. A two-sided $t$-test is also done for each label fraction as well as when computing the in-distribution results. If no * is shown, the $p$-value is less than 0.001, otherwise, the $p$-value is as indicated. The red lines indicate the amount of data that REMEDIS needs to match the highest supervised AI baseline performance when simulated in a new OOD clinical deployment setting and summarize the amount of annotated data and clinician hours potentially saved by using REMEDIS for each medical task considered.
  • Figure A.1: Overview of our experimental setup for the development of REMEDIS and the baseline AI models across the various medical imaging tasks. In particular, we detail the different stages in which unlabeled and labeled (both in-distribution and out-of-distribution) are used for model development and evaluation.
  • Figure A.2: Visual samples of distribution shifts across the medical imaging tasks considered in this study. Variation between in-distribution and out-of-distribution data can be visually subtle or pronounced. This variation includes but is not limited to changes in contrast, sharpness or tint, differences in non-linear effects of X-ray sensor construction or in zoom levels. The underlying cause of the distribution shift can be associated with technology shift, demographic shift, or behavioural shift finlayson2020clinician.
  • ...and 14 more figures