MedVersa: A Generalist Foundation Model for Medical Image Interpretation
Hong-Yu Zhou, Julián Nicolás Acosta, Subathra Adithan, Suvrankar Datta, Eric J. Topol, Pranav Rajpurkar
TL;DR
MedVersa introduces a generalist medical foundation model that uses an optimizable LLM as an orchestrator to coordinate diverse vision modules for multimodal medical image interpretation. Trained on tens of millions of medical instances, it achieves state-of-the-art or competitive performance across radiology report generation, vision-centric tasks, and multiple external cohorts. Radiologist and user studies indicate AI-generated reports are often superior or equivalent and can substantially reduce reporting time, highlighting tangible clinical workflow benefits. The work demonstrates the viability of extensible, multimodal generalist AI in clinical radiology and outlines a pathway toward broader modality coverage and continual learning.
Abstract
Current medical AI systems are often limited to narrow applications, hindering widespread adoption. We present MedVersa, a generalist foundation model trained on tens of millions of compiled medical instances. MedVersa unlocks generalist learning from multimodal inputs and outputs, representing the first example of a generalist model reaching competitive performance with leading specialized solutions across a variety of medical imaging scenarios. MedVersa achieves state-of-the-art performance in nine tasks, sometimes outperforming counterparts by over 10%. Radiologist evaluation shows MedVersa-generated reports get superior performance in 95% of normal studies, while matching or exceeding human reports in 71% of cases overall. User studies showed notable reductions in report writing time and discrepancies with the use of MedVersa. Our findings underscore the value of flexible, multimodal AI systems in advancing medical image interpretation and supporting clinical expertise.
