A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning
Stefano Cerri, Asbjørn Munk, Jakob Ambsdorf, Julia Machnio, Sebastian Nørgaard Llambias, Vardan Nersesjan, Christian Hedeager Krag, Peirong Liu, Pablo Rocamora García, Mostafa Mehdipour Ghazi, Mikael Boesen, Michael Eriksen Benros, Juan Eugenio Iglesias, Mads Nielsen
TL;DR
The FOMO300K dataset is a large-scale, heterogeneous dataset of 318,877 brain Magnetic Resonance Imaging scans from 82,678 MRI sessions and 59,969 subjects, aggregated from 920 publicly available sources to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.
Abstract
We present FOMO300K, a large-scale, heterogeneous dataset of 318,877 brain Magnetic Resonance Imaging (MRI) scans from 82,678 MRI sessions and 59,969 subjects, aggregated from 920 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing entry barriers for new users. Companion code for self-supervised pretraining and finetuning is provided, along with pretrained models. FOMO300K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.
