MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks
Anubhav Gupta, Islam Osman, Mohamed S. Shehata, John W. Braun
TL;DR
This work tackles data scarcity and domain shift in medical imaging by compiling a large unlabeled Medical Imaging Dataset (MID) and training a Vision Transformer–based MedMAE backbone via Masked Autoencoder pretraining. The approach yields a versatile, domain-specific representation learned through self-supervision that transfers effectively to diverse medical tasks, including quality control, cancer prediction, pneumonia detection, and segmentation. Across four tasks, MedMAE outperforms ImageNet-pretrained and standard MAE baselines, with average gains around 8%. These results demonstrate the value of domain-specific self-supervised pretraining for medical imaging and point toward continual learning approaches to support multi-task, single-model deployment.
Abstract
Medical imaging tasks are very challenging due to the lack of publicly available labeled datasets. Hence, it is difficult to achieve high performance with existing deep-learning models as they require a massive labeled dataset to be trained effectively. An alternative solution is to use pre-trained models and fine-tune them using the medical imaging dataset. However, all existing models are pre-trained using natural images, which is a completely different domain from that of medical imaging, which leads to poor performance due to domain shift. To overcome these problems, we propose a large-scale unlabeled dataset of medical images and a backbone pre-trained using the proposed dataset with a self-supervised learning technique called Masked autoencoder. This backbone can be used as a pre-trained model for any medical imaging task, as it is trained to learn a visual representation of different types of medical images. To evaluate the performance of the proposed backbone, we used four different medical imaging tasks. The results are compared with existing pre-trained models. These experiments show the superiority of our proposed backbone in medical imaging tasks.
