Table of Contents
Fetching ...

Health system learning achieves generalist neuroimaging models

Akhil Kondepudi, Akshay Rao, Chenhui Zhao, Yiwei Lyu, Samir Harake, Soumyanil Banerjee, Rushikesh Joshi, Anna-Katharina Meissner, Renly Hou, Cheng Jiang, Asadur Chowdury, Ashok Srinivasan, Brian Athey, Vikas Gulani, Aditya Pandey, Honglak Lee, Todd Hollon

TL;DR

The introduction of NeuroVFM, a visual foundation model trained on 5.24 million clinical MRI and CT volumes using a scalable volumetric joint-embedding predictive architecture, establishes health system learning as a paradigm for building generalist medical AI and provides a scalable framework for clinical foundation models.

Abstract

Frontier artificial intelligence (AI) models, such as OpenAI's GPT-5 and Meta's DINOv3, have advanced rapidly through training on internet-scale public data, yet such systems lack access to private clinical data. Neuroimaging, in particular, is underrepresented in the public domain due to identifiable facial features within MRI and CT scans, fundamentally restricting model performance in clinical medicine. Here, we show that frontier models underperform on neuroimaging tasks and that learning directly from uncurated data generated during routine clinical care at health systems, a paradigm we call health system learning, yields high-performance, generalist neuroimaging models. We introduce NeuroVFM, a visual foundation model trained on 5.24 million clinical MRI and CT volumes using a scalable volumetric joint-embedding predictive architecture. NeuroVFM learns comprehensive representations of brain anatomy and pathology, achieving state-of-the-art performance across multiple clinical tasks, including radiologic diagnosis and report generation. The model exhibits emergent neuroanatomic understanding and interpretable visual grounding of diagnostic findings. When paired with open-source language models through lightweight visual instruction tuning, NeuroVFM generates radiology reports that surpass frontier models in accuracy, clinical triage, and expert preference. Through clinically grounded visual understanding, NeuroVFM reduces hallucinated findings and critical errors, offering safer clinical decision support. These results establish health system learning as a paradigm for building generalist medical AI and provide a scalable framework for clinical foundation models.

Health system learning achieves generalist neuroimaging models

TL;DR

The introduction of NeuroVFM, a visual foundation model trained on 5.24 million clinical MRI and CT volumes using a scalable volumetric joint-embedding predictive architecture, establishes health system learning as a paradigm for building generalist medical AI and provides a scalable framework for clinical foundation models.

Abstract

Frontier artificial intelligence (AI) models, such as OpenAI's GPT-5 and Meta's DINOv3, have advanced rapidly through training on internet-scale public data, yet such systems lack access to private clinical data. Neuroimaging, in particular, is underrepresented in the public domain due to identifiable facial features within MRI and CT scans, fundamentally restricting model performance in clinical medicine. Here, we show that frontier models underperform on neuroimaging tasks and that learning directly from uncurated data generated during routine clinical care at health systems, a paradigm we call health system learning, yields high-performance, generalist neuroimaging models. We introduce NeuroVFM, a visual foundation model trained on 5.24 million clinical MRI and CT volumes using a scalable volumetric joint-embedding predictive architecture. NeuroVFM learns comprehensive representations of brain anatomy and pathology, achieving state-of-the-art performance across multiple clinical tasks, including radiologic diagnosis and report generation. The model exhibits emergent neuroanatomic understanding and interpretable visual grounding of diagnostic findings. When paired with open-source language models through lightweight visual instruction tuning, NeuroVFM generates radiology reports that surpass frontier models in accuracy, clinical triage, and expert preference. Through clinically grounded visual understanding, NeuroVFM reduces hallucinated findings and critical errors, offering safer clinical decision support. These results establish health system learning as a paradigm for building generalist medical AI and provide a scalable framework for clinical foundation models.

Paper Structure

This paper contains 4 sections, 2 equations, 38 figures.

Figures (38)

  • Figure 1: Overview of Health System Learning with NeuroVFM.a, Health system learning directly models the data generating process of clinical operations at large health systems. The UM-NeuroImages dataset comprises 5.24 million volumes from 566,915 studies, acquired over 20 years at Michigan Medicine. Age and sex distribution are shown below. b, NeuroVFM was trained using Vol-JEPA, a scalable volumetric self-supervised method that learns a unified latent space for CT and MRI. A 3D volume is partitioned into a small context and larger masked target with the background removed. The context is encoded by an online (student) 3D vision transformer. A predictor combines context latents with position encodings of the masked target region to predict the masked region latents. Ground truth latents for the masked region are generated by an offline (teacher) encoder updated by an exponential moving average, with gradients stopped through the teacher. Training minimizes the distance between the predicted and teacher latents using a smooth L1 loss. c, At inference, NeuroVFM encodes all volumes in a neuroimaging study into latent visual tokens for downstream tasks. The same visual tokens can be used to fine-tune an open-source multimodal language model (i.e., Qwen3-14B, LLaVA-1.5-style) to generate radiology reports. Illustrative findings and corresponding grounded attention maps are shown. The findings can then be passed to a frontier reasoning LLM (i.e., GPT-5-thinking) for interpretation and triage.
  • Figure 1: Extended NeuroVFM workflow. Caption on next page.
  • Figure 1: SQL query to retrieve neuroimaging studies. Using the SDW, we retrieved all CT and MRI studies whose body part or description contained “brain,” “head,” “orbit(s),” or “neck,” returning exam metadata and report text. The SQL query filters studies acquired before June 1 2023 for model development. The held-out prospective set applies the same query on and after that date.
  • Figure 1: Descriptive characteristics of the UM-NeuroImages prospective test set. The temporally held-out UM-NeuroImages test set comprises 50,293 CT and MRI studies. For each study, the table reports modality, scanner manufacturer, acquisition site, patient age at scan (with ages $\geqslant$90 years truncated to 90 years due to PHI constraints), sex, ethnicity, and MRI field strength (when applicable). This large table is available in the accompanying spreadsheet.
  • Figure 2: NeuroVFM results.a, NeuroVFM performance over 82 CT and 74 MRI diagnostic tasks, and compared with both health system-scale (HLIP) and internet-scale (DINOv3, BiomedCLIP) models. NeuroVFM outperforms models trained with language supervision and those trained on public internet data. Results are mean $\pm$ 95% CI. b, NeuroVFM exhibits foundation model behavior, with performance scaling across data volume and model size. c, Performance across diagnostic ontologies, such as traumatic, congenital, ischemic lesion, etc. is shown for both CT and MRI. NeuroVFM consistently outperforms other baselines. d, NeuroVFM was tested on CT and MRI-based external benchmarks, including Alzheimer's disease, Parkinson's disease, and autism classification, as well as intracranial hemorrhage detection. NeuroVFM outperformed internet-scale models by a wide margin. e, We discovered an empirical log-linear scaling relationship between the number of positive training examples and model performance. This relationship held across at least 4 orders of magnitude, imaging modalities, and models (Extended Data Fig. \ref{['exfig:ex_data4']} and \ref{['exfig:ex_data5']}).
  • ...and 33 more figures