Table of Contents
Fetching ...

Towards Generalisable Foundation Models for Brain MRI

Moona Mazher, Geoff J. M. Parker, Daniel C. Alexander

TL;DR

BrainFound presents a 3D-aware self-supervised foundation model for brain MRI by extending the 2D DINOv2 framework to volumetric data through a slice-wise strategy and multimodal input fusion. It demonstrates strong generalisation, few-shot learning capability, and competitive segmentation performance across neurodegenerative and oncological tasks by pretraining on large unlabeled MRI collections and fine-tuning on diverse downstream datasets. The approach leverages natural image priors via DINOv2 while adapting to domain-specific MRI features, enabling robust cross-dataset performance with partial modality availability. The work highlights BrainFound's potential for scalable, clinically relevant neuroimaging pipelines and outlines future directions toward fully 3D SSL architectures and broader modality integration.

Abstract

Foundation models in artificial intelligence (AI) are transforming medical imaging by enabling general-purpose feature learning from large-scale, unlabeled datasets. In this work, we introduce BrainFound, a self-supervised foundation model for brain MRI, built by extending DINO-v2, a vision transformer originally designed for 2D natural images. BrainFound adapts DINO-v2 to model full 3D brain anatomy by incorporating volumetric information from sequential MRI slices, moving beyond conventional single-slice paradigms. It supports both single- and multimodal inputs, enabling a broad range of downstream tasks, including disease detection and image segmentation, while generalising across varied imaging protocols and clinical scenarios. We show that BrainFound consistently outperforms existing self-supervised pretraining strategies and supervised baselines, particularly in label-scarce and multi-contrast settings. By integrating information from diverse 3D MRI modalities (e.g., T1, T2, FLAIR), it enhances diagnostic accuracy and reduces dependency on extensive expert annotations. This flexibility makes BrainFound a scalable and practical solution for 3D neuroimaging pipelines, with significant potential for clinical deployment and research innovation.

Towards Generalisable Foundation Models for Brain MRI

TL;DR

BrainFound presents a 3D-aware self-supervised foundation model for brain MRI by extending the 2D DINOv2 framework to volumetric data through a slice-wise strategy and multimodal input fusion. It demonstrates strong generalisation, few-shot learning capability, and competitive segmentation performance across neurodegenerative and oncological tasks by pretraining on large unlabeled MRI collections and fine-tuning on diverse downstream datasets. The approach leverages natural image priors via DINOv2 while adapting to domain-specific MRI features, enabling robust cross-dataset performance with partial modality availability. The work highlights BrainFound's potential for scalable, clinically relevant neuroimaging pipelines and outlines future directions toward fully 3D SSL architectures and broader modality integration.

Abstract

Foundation models in artificial intelligence (AI) are transforming medical imaging by enabling general-purpose feature learning from large-scale, unlabeled datasets. In this work, we introduce BrainFound, a self-supervised foundation model for brain MRI, built by extending DINO-v2, a vision transformer originally designed for 2D natural images. BrainFound adapts DINO-v2 to model full 3D brain anatomy by incorporating volumetric information from sequential MRI slices, moving beyond conventional single-slice paradigms. It supports both single- and multimodal inputs, enabling a broad range of downstream tasks, including disease detection and image segmentation, while generalising across varied imaging protocols and clinical scenarios. We show that BrainFound consistently outperforms existing self-supervised pretraining strategies and supervised baselines, particularly in label-scarce and multi-contrast settings. By integrating information from diverse 3D MRI modalities (e.g., T1, T2, FLAIR), it enhances diagnostic accuracy and reduces dependency on extensive expert annotations. This flexibility makes BrainFound a scalable and practical solution for 3D neuroimaging pipelines, with significant potential for clinical deployment and research innovation.

Paper Structure

This paper contains 34 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Overview of the training pipeline for BrainFound and baseline models. The baseline models comprised SL_ImageNet, SSL_DINOv2, and SSL_BrainImages. SL_ImageNet uses supervised learning on ImageNet-21k, which includes 14 million images with categorical labels. SSL_DINOv2 was pretrained on LVD-142M dataset. SSL_BrainImages employs self-supervised learning from scratch on brain images. BrainFound, on the other hand, utilises self-supervised learning on brain images, starting from the weights pre-trained on SSL_DINOv2. *A dog image is used to demonstrate the natural image LVD-142M dataset.
  • Figure 2: Overview of MRI datasets and modalities, for the SSL and downstream workflow. (a) Number of MRI cases per dataset, color-coded by disease type, including Alzheimer’s disease, frontotemporal dementia (FTD), brain tumours, and fetal brain conditions. Usage of each dataset in the pipeline: whether used for self-supervised learning (SSL) only or for both SSL and fine-tuning. (b) Imaging modality availability (T1, T2, FLAIR) for each dataset, indicating multimodal coverage relevant for pre-training. (c) Diagram of the overall framework: multiple datasets and modalities are used in a knowledge distillation-based SSL pretraining stage, followed by downstream fine-tuning showcasing generalisation across disease detection and image segmentation tasks.
  • Figure 3: Schematic architecture of BrainFound. (a) Multi-contrast MRI volumes (T1, T2, FLAIR) collected from 12 public datasets ($\sim$10,000 scans) undergo preprocessing and are converted into stacked axial slice representations (3 × 224 × 224). (b) A pretrained DINOv2 teacher–student SSL framework provides initial weights to BrainFound, which is then adapted through domain-specific SSL on stacked MRI slices using probability-distribution matching. (c) Distribution of cases across contributing datasets, illustrating the heterogeneity and scale of the training corpus. (d) Downstream applications: BrainFound serves as a backbone for disease-detection models and as an encoder for lightweight image-segmentation decoders.
  • Figure 4: An overview of the development and application of BrainFound. Stage one involves pretraining BrainFound using Self-supervised learning on large-scale multi-scan brain MRI datasets. Stage two adapts the pretrained model to multiple downstream disease detection and image segmentation tasks through supervised fine-tuning, followed by both internal and external evaluation to assess generalisability across datasets.
  • Figure 5: AUROC comparison across pretraining strategies for Alzheimer's disease detection. Models were initialized with one of four pretraining methods: SL_ImageNet, SSL_DINOv2, SSL_BrainImages, or the proposed BrainFound. Each task was evaluated using 5-fold cross-validation. The mean AUROC across folds is shown as the height of each bar. Individual fold results are indicated by the five dots on each bar, illustrating variability across folds. BrainFound consistently outperforms the baseline methods, with statistically significant improvements (P $<$ 0.05, two-sided t-test zhou2023foundation) indicated above the bars.
  • ...and 8 more figures