An OpenMind for 3D medical vision self-supervised learning
Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi, Sebastian Ziegler, Michal Nohel, Robin Peretzke, Gregor Köhler, Klaus H. Maier-Hein
TL;DR
This work tackles the lack of standardization in 3D medical SSL by introducing OpenMind, the largest public pre-training dataset for 3D brain MRI across 23 modalities, and a standardized OpenMind Benchmark to compare CNN and Transformer SSL approaches on diverse downstream tasks. It demonstrates that reconstruction-based pre-training (notably MAE) yields strong segmentation performance, while contrastive methods excel in classification, with Transformers like Primus-M showing meaningful gains when pre-trained. The study emphasizes the critical roles of fine-tuning schedules, data quality/diversity, and privacy-aware preprocessing, and provides open-source code and pretrained checkpoints to enable rapid reproduction and further method development. Overall, OpenMind serves as a foundation for data-centric and architecture-agnostic progress in 3D SSL for medical imaging and highlights directions for future improvements, including PEFT approaches and better cross-task generalization.
Abstract
The field of self-supervised learning (SSL) for 3D medical images lacks consistency and standardization. While many methods have been developed, it is impossible to identify the current state-of-the-art, due to i) varying and small pretraining datasets, ii) varying architectures, and iii) being evaluated on differing downstream datasets. In this paper, we bring clarity to this field and lay the foundation for further method advancements through three key contributions: We a) publish the largest publicly available pre-training dataset comprising 114k 3D brain MRI volumes, enabling all practitioners to pre-train on a large-scale dataset. We b) benchmark existing 3D self-supervised learning methods on this dataset for a state-of-the-art CNN and Transformer architecture, clarifying the state of 3D SSL pre-training. Among many findings, we show that pre-trained methods can exceed a strong from-scratch nnU-Net ResEnc-L baseline. Lastly, we c) publish the code of our pre-training and fine-tuning frameworks and provide the pre-trained models created during the benchmarking process to facilitate rapid adoption and reproduction.
