Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

Zhijian Yang; Noel DSouza; Istvan Megyeri; Xiaojian Xu; Amin Honarmandi Shandiz; Farzin Haddadpour; Krisztian Koos; Laszlo Rusko; Emanuele Valeriano; Bharadwaj Swaninathan; Lei Wu; Parminder Bhatia; Taha Kass-Hout; Erhan Bas

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

Zhijian Yang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaninathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan Bas

TL;DR

Decipher-MR targets MRI-specific foundation modeling by training a large, diverse 3D vision-language system on $200{,}000$ MRI series from over $22{,}000$ studies, augmented with radiology report supervision and a two-stage pretraining pipeline to align image and text representations. The model adopts a frozen encoder with modular, task-specific decoders, enabling efficient adaptation to classification, retrieval, segmentation, and localization tasks, and shows robust cross-domain performance and rapid convergence. Across extensive experiments, Decipher-MR outperforms MRI- and general-purpose baselines on multiple tasks, demonstrates strong cross-modal retrieval, and provides competitive segmentation and anomaly localization results, highlighting its potential for scalable MRI AI in clinical and research settings. The work emphasizes the importance of data diversity, region-aware supervision, and lightweight decoders for generalizable, efficient MRI analysis, while acknowledging biases and areas for future improvement such as region-level alignment and broader textual diversity.

Abstract

Magnetic Resonance Imaging is a critical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity hinder scalable, generalizable machine learning. Although foundation models have revolutionized language and vision tasks, their application to MRI remains constrained by data scarcity and narrow anatomical focus. We present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust representations for broad applications. To enable efficient use, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent improvements over existing foundation models and task-specific approaches. These results position Decipher-MR as a versatile foundation for MRI-based AI in clinical and research settings.

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

TL;DR

Decipher-MR targets MRI-specific foundation modeling by training a large, diverse 3D vision-language system on

MRI series from over

studies, augmented with radiology report supervision and a two-stage pretraining pipeline to align image and text representations. The model adopts a frozen encoder with modular, task-specific decoders, enabling efficient adaptation to classification, retrieval, segmentation, and localization tasks, and shows robust cross-domain performance and rapid convergence. Across extensive experiments, Decipher-MR outperforms MRI- and general-purpose baselines on multiple tasks, demonstrates strong cross-modal retrieval, and provides competitive segmentation and anomaly localization results, highlighting its potential for scalable MRI AI in clinical and research settings. The work emphasizes the importance of data diversity, region-aware supervision, and lightweight decoders for generalizable, efficient MRI analysis, while acknowledging biases and areas for future improvement such as region-level alignment and broader textual diversity.

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

TL;DR

Abstract

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)