Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations
Zhijian Yang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaninathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan Bas
TL;DR
Decipher-MR targets MRI-specific foundation modeling by training a large, diverse 3D vision-language system on $200{,}000$ MRI series from over $22{,}000$ studies, augmented with radiology report supervision and a two-stage pretraining pipeline to align image and text representations. The model adopts a frozen encoder with modular, task-specific decoders, enabling efficient adaptation to classification, retrieval, segmentation, and localization tasks, and shows robust cross-domain performance and rapid convergence. Across extensive experiments, Decipher-MR outperforms MRI- and general-purpose baselines on multiple tasks, demonstrates strong cross-modal retrieval, and provides competitive segmentation and anomaly localization results, highlighting its potential for scalable MRI AI in clinical and research settings. The work emphasizes the importance of data diversity, region-aware supervision, and lightweight decoders for generalizable, efficient MRI analysis, while acknowledging biases and areas for future improvement such as region-level alignment and broader textual diversity.
Abstract
Magnetic Resonance Imaging is a critical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity hinder scalable, generalizable machine learning. Although foundation models have revolutionized language and vision tasks, their application to MRI remains constrained by data scarcity and narrow anatomical focus. We present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust representations for broad applications. To enable efficient use, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent improvements over existing foundation models and task-specific approaches. These results position Decipher-MR as a versatile foundation for MRI-based AI in clinical and research settings.
