Table of Contents
Fetching ...

Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications

Daan Schouten, Giulia Nicoletti, Bas Dille, Catherine Chia, Pierpaolo Vendittelli, Megan Schuurmans, Geert Litjens, Nadieh Khalili

TL;DR

This scoping review examines the landscape of deep learning-based multimodal AI applications across the medical domain, analyzing 432 papers published between 2018 and 2024 and reveals that multimodal AI models consistently outperform their unimodal counterparts.

Abstract

Recent technological advances in healthcare have led to unprecedented growth in patient data quantity and diversity. While artificial intelligence (AI) models have shown promising results in analyzing individual data modalities, there is increasing recognition that models integrating multiple complementary data sources, so-called multimodal AI, could enhance clinical decision-making. This scoping review examines the landscape of deep learning-based multimodal AI applications across the medical domain, analyzing 432 papers published between 2018 and 2024. We provide an extensive overview of multimodal AI development across different medical disciplines, examining various architectural approaches, fusion strategies, and common application areas. Our analysis reveals that multimodal AI models consistently outperform their unimodal counterparts, with an average improvement of 6.2 percentage points in AUC. However, several challenges persist, including cross-departmental coordination, heterogeneous data characteristics, and incomplete datasets. We critically assess the technical and practical challenges in developing multimodal AI systems and discuss potential strategies for their clinical implementation, including a brief overview of commercially available multimodal AI models for clinical decision-making. Additionally, we identify key factors driving multimodal AI development and propose recommendations to accelerate the field's maturation. This review provides researchers and clinicians with a thorough understanding of the current state, challenges, and future directions of multimodal AI in medicine.

Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications

TL;DR

This scoping review examines the landscape of deep learning-based multimodal AI applications across the medical domain, analyzing 432 papers published between 2018 and 2024 and reveals that multimodal AI models consistently outperform their unimodal counterparts.

Abstract

Recent technological advances in healthcare have led to unprecedented growth in patient data quantity and diversity. While artificial intelligence (AI) models have shown promising results in analyzing individual data modalities, there is increasing recognition that models integrating multiple complementary data sources, so-called multimodal AI, could enhance clinical decision-making. This scoping review examines the landscape of deep learning-based multimodal AI applications across the medical domain, analyzing 432 papers published between 2018 and 2024. We provide an extensive overview of multimodal AI development across different medical disciplines, examining various architectural approaches, fusion strategies, and common application areas. Our analysis reveals that multimodal AI models consistently outperform their unimodal counterparts, with an average improvement of 6.2 percentage points in AUC. However, several challenges persist, including cross-departmental coordination, heterogeneous data characteristics, and incomplete datasets. We critically assess the technical and practical challenges in developing multimodal AI systems and discuss potential strategies for their clinical implementation, including a brief overview of commercially available multimodal AI models for clinical decision-making. Additionally, we identify key factors driving multimodal AI development and propose recommendations to accelerate the field's maturation. This review provides researchers and clinicians with a thorough understanding of the current state, challenges, and future directions of multimodal AI in medicine.

Paper Structure

This paper contains 28 sections, 4 figures.

Figures (4)

  • Figure 1: Overview of the screening process.
  • Figure 2: Overview of the data modalities used in the reviewed articles. (A) Distribution of articles by year. Bar chart shows an exponential increment in the number of studies per year from 2018 to 2024. Extrapolating, the number of multimodal medical AI studies is expected to reach 199 by the end of 2024. (B) Pie chart shows the proportions of different modality groups and the respective data modalities used across studies. (C) Stacked bar chart illustrates the growth trends of data modality groups over the years. Note that the values used in this chart represent the counts of individual data modality uses, where multiple modalities could be presented in a single article. (D) Diagram shows the combination trends between data modalities per model. The diagram captures the unique modality combinations presented in each models the individual article has presented. The numbers in brackets indicate the total summation of models per category, whereas the numbers without brackets represent the count of models of each combination, visualized with the ribbon bands between the vertical nodes. The majority of the models used two data modalities, and a portion of the total used three and four modalities. Three multimodal models used data modalities that were grouped under "other non-image" category based on the definition used in this review.
  • Figure 3: A deeper dive into the medical tasks and data sources of the review. The numbers on the bars indicate the total summation per category. (A) Top: The number of articles per organ system. Bottom: Distribution of medical tasks across organ systems. Pie charts show diagnosis being the most prevalent medical task performed in studies of all organ systems. (B) The use trends of data sources in this review. Note that the values used in the chart represent the total count of uses of all the reviewed studies, where multiple data sources could be referred to in each study. About 61% of the total uses were sourced from data portals (e.g. TCGA, ADNI, etc.), 15% from research data shared publicly by publications, and 24% of the data uses were private datasets that were not made public. (C) Distribution of public data sources (excluding private datasets) across the studies of organ systems. Similarly, the nervous and respiratory systems are leading in the count of public data uses. A detailed breakdown of these public data sources can be found in the supplementary materials.
  • Figure 4: Simplified schematic view of the different fusion stages.