Large-scale modality-invariant foundation models for brain MRI analysis: Application to lesion segmentation
Petros Koutsouvelis, Matej Gazda, Leroy Volmer, Sina Amirrajab, Kamil Barbierik, Branislav Setlak, Jakub Gazda, Peter Drotar
TL;DR
This work examines large-scale self-supervised pre-training for brain MRI using a modality-invariant representation objective (MCL) integrated with masked image modeling (MIM) to support lesion segmentation in stroke and epilepsy. While MCL achieves cross-modality embedding alignment, it does not improve per-modality lesion segmentation beyond a strong MIM baseline; segmentation performance remains highly modality-dependent, underscoring the importance of modality-specific texture and contrast. The study demonstrates that invariant representations may excel for global brain tasks rather than nuanced local pathology, and provides pretrained checkpoints to foster future research. Overall, the findings guide the design of neuroimaging foundation models toward balancing cross-modality alignment with preservation of modality-specific information for segmentation tasks.
Abstract
The field of computer vision is undergoing a paradigm shift toward large-scale foundation model pre-training via self-supervised learning (SSL). Leveraging large volumes of unlabeled brain MRI data, such models can learn anatomical priors that improve few-shot performance in diverse neuroimaging tasks. However, most SSL frameworks are tailored to natural images, and their adaptation to capture multi-modal MRI information remains underexplored. This work proposes a modality-invariant representation learning setup and evaluates its effectiveness in stroke and epilepsy lesion segmentation, following large-scale pre-training. Experimental results suggest that despite successful cross-modality alignment, lesion segmentation primarily benefits from preserving fine-grained modality-specific features. Model checkpoints and code are made publicly available.
