MultiMed: Massively Multimodal and Multitask Medical Understanding
Shentong Mo, Paul Pu Liang
TL;DR
MultiMed tackles the scarcity of large-scale, diverse multimodal medical datasets by introducing a benchmark with 2.56 million samples across 10 modalities and 11 tasks to evaluate unimodal, multimodal, and multitask learning. The authors formalize notations and fusion strategies, and demonstrate that multimodal multitask models achieve superior performance and robustness, including zero-shot and few-shot generalization, across a wide range of medical problems. Key contributions include the dataset design with organ/cell, modality, and task diversity; comprehensive experiments showing clear gains from modality integration; and analyses of generalization, robustness, and novel modality combinations with implications for personalized medicine and clinical decision support. The work positions MultiMed as a scalable, community-driven platform for advancing generalist biomedical AI, with attention to potential biases, fairness, and real-world deployment considerations.
Abstract
Biomedical data is inherently multimodal, consisting of electronic health records, medical imaging, digital pathology, genome sequencing, wearable sensors, and more. The application of artificial intelligence tools to these multifaceted sensing technologies has the potential to revolutionize the prognosis, diagnosis, and management of human health and disease. However, current approaches to biomedical AI typically only train and evaluate with one or a small set of medical modalities and tasks. This limitation hampers the development of comprehensive tools that can leverage the rich interconnected information across many heterogeneous biomedical sensors. To address this challenge, we present MultiMed, a benchmark designed to evaluate and enable large-scale learning across a wide spectrum of medical modalities and tasks. MultiMed consists of 2.56 million samples across ten medical modalities such as medical reports, pathology, genomics, and protein data, and is structured into eleven challenging tasks, including disease prognosis, protein structure prediction, and medical question answering. Using MultiMed, we conduct comprehensive experiments benchmarking state-of-the-art unimodal, multimodal, and multitask models. Our analysis highlights the advantages of training large-scale medical models across many related modalities and tasks. Moreover, MultiMed enables studies of generalization across related medical concepts, robustness to real-world noisy data and distribution shifts, and novel modality combinations to improve prediction performance. MultiMed will be publicly available and regularly updated and welcomes inputs from the community.
