Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

Asim Waqas; Aakash Tripathi; Ravi P. Ramachandran; Paul Stewart; Ghulam Rasool

Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

Asim Waqas, Aakash Tripathi, Ravi P. Ramachandran, Paul Stewart, Ghulam Rasool

TL;DR

The review addresses how deep neural networks enable multimodal data integration in oncology, emphasizing Graph Neural Networks and Transformers as core tools for fusing radiology, pathology, genomics, and clinical data. It presents a structured taxonomy for multimodal learning, surveys modality-specific datasets and cancer applications, and analyzes pre-, intra-, and post-learning fusion strategies. Key contributions include a comprehensive taxonomy, synthesis of GNN/Transformer-based MML in oncology, and a roadmap of challenges (data availability, alignment, missing data, generalization, explainability, privacy) with practical implications. The work articulates how scalable, interpretable, and uncertainty-aware multimodal frameworks can advance cancer prevention, early detection, and personalized treatment across diverse data sources and cancer types.

Abstract

Cancer has relational information residing at varying scales, modalities, and resolutions of the acquired data, such as radiology, pathology, genomics, proteomics, and clinical records. Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment. There can be disease-related information that is too subtle for humans or existing technological tools to discern visually. Traditional methods typically focus on partial or unimodal information about biological systems at individual scales and fail to encapsulate the complete spectrum of the heterogeneous nature of data. Deep neural networks have facilitated the development of sophisticated multimodal data fusion approaches that can extract and integrate relevant information from multiple sources. Recent deep learning frameworks such as Graph Neural Networks (GNNs) and Transformers have shown remarkable success in multimodal learning. This review article provides an in-depth analysis of the state-of-the-art in GNNs and Transformers for multimodal data fusion in oncology settings, highlighting notable research studies and their findings. We also discuss the foundations of multimodal learning, inherent challenges, and opportunities for integrative learning in oncology. By examining the current state and potential future developments of multimodal data integration in oncology, we aim to demonstrate the promising role that multimodal neural networks can play in cancer prevention, early detection, and treatment through informed oncology practices in personalized settings.

Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

TL;DR

Abstract

Paper Structure (58 sections, 3 equations, 9 figures, 1 table)

This paper contains 58 sections, 3 equations, 9 figures, 1 table.

Fundamentals of Multimodal Learning (MML)
Data Modalities in Oncology
Molecular Data
Imaging Data
Clinical Data
Taxonomy of MML
Pre-processing
Feature Extraction
Data Fusion
Primary Learner
Final Classifier
Data Fusion Strategies
Early Fusion
Intermediate Fusion
Late Fusion
...and 43 more sections

Figures (9)

Figure 1: Number of publications involving DL, GNNs, GNNs in the medical domain, overall multimodal and multimodal in biomedical and clinical sciences in the period 2015-2024 dimensions.
Figure 2: We present various data modalities that capture specific aspects of cancer at different scales. For example, radiological images capture organ or sub-organ level abnormalities, while tissue analysis may provide changes in the cellular structure and morphology. On the other hand, various molecular data types may provide insights into genetic mutations and epigenetic changes.
Figure 3: Taxonomy, stages, and techniques of multimodal data fusion are presented. Early, late, cross-modality fusion methods integrate individual data modalities (or extracted features) before, after, or at the primary learning step, respectively.
Figure 4: (a) The commonly occurring graph types are presented, including (1) undirected and directed, (2) homogeneous and heterogeneous, (3) dynamic and static, (4) attributed (edges) and unattributed. (b) Three different types of tasks performed using the graph data are presented and include (1) node-level, (2) link-level, and (3) graph-level analyses. (c) Various categories of representation learning for graphs are presented.
Figure 5: Convolution operation for graphs vs. image data. The canonical order of the input is important in CNNs, whereas in GNNs, the order of the input nodes is not important. From the convolution operation perspective, CNNs can be considered a subset of GNNs hamilton2020graph.
...and 4 more figures

Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

TL;DR

Abstract

Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

Authors

TL;DR

Abstract

Table of Contents

Figures (9)