Table of Contents
Fetching ...

Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology

Omnia Alwazzan, Amaya Gallagher-Syed, Thomas O. Millner, Sebastian Brandner, Ioannis Patras, Silvia Marino, Gregory Slabaugh

TL;DR

This work addresses the challenge of accurately subtyping CNS tumors by integrating DNA methylation with histopathology from WSIs. It introduces MOAD-FNet, a dual fusion network that performs early patch-level fusion and late MOAB-based cross-modal fusion, enabling interpretable heatmaps of diagnostic regions. Across NHNN UK brain tumor data and TCGA survival tasks (BLCA and BRCA), MOAD-FNet demonstrates superior subtyping performance and competitive survival prediction, with ablation studies validating the necessity of both fusion stages. The approach advances precision oncology by providing a scalable, interpretable framework that leverages complementary molecular and morphological information for improved diagnostic accuracy and prognostic assessment.

Abstract

The integration of DNA methylation data with a Whole Slide Image (WSI) offers significant potential for enhancing the diagnostic precision of central nervous system (CNS) tumor classification in neuropathology. While existing approaches typically integrate encoded omic data with histology at either an early or late fusion stage, the potential of reintroducing omic data through dual fusion remains unexplored. In this paper, we propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions, boosting performance through multimodal integration. In the early fusion stage, omic embeddings are projected onto WSI patches in latent-space, which generates embeddings that encapsulate per-patch molecular and morphological insights. This effectively incorporates omic information into the spatial representation of the WSI. These embeddings are then refined with a Multiple Instance Learning gated attention mechanism which attends to diagnostic patches. In the late fusion stage, we reintroduce the omic data by fusing it with slide-level omic-WSI embeddings using a Multimodal Outer Arithmetic Block (MOAB), which richly intermingles features from both modalities, capturing their correlations and complementarity. We demonstrate accurate CNS tumor subtyping across 20 fine-grained subtypes and validate our approach on benchmark datasets, achieving improved survival prediction on TCGA-BLCA and competitive performance on TCGA-BRCA compared to state-of-the-art methods. This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.

Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology

TL;DR

This work addresses the challenge of accurately subtyping CNS tumors by integrating DNA methylation with histopathology from WSIs. It introduces MOAD-FNet, a dual fusion network that performs early patch-level fusion and late MOAB-based cross-modal fusion, enabling interpretable heatmaps of diagnostic regions. Across NHNN UK brain tumor data and TCGA survival tasks (BLCA and BRCA), MOAD-FNet demonstrates superior subtyping performance and competitive survival prediction, with ablation studies validating the necessity of both fusion stages. The approach advances precision oncology by providing a scalable, interpretable framework that leverages complementary molecular and morphological information for improved diagnostic accuracy and prognostic assessment.

Abstract

The integration of DNA methylation data with a Whole Slide Image (WSI) offers significant potential for enhancing the diagnostic precision of central nervous system (CNS) tumor classification in neuropathology. While existing approaches typically integrate encoded omic data with histology at either an early or late fusion stage, the potential of reintroducing omic data through dual fusion remains unexplored. In this paper, we propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions, boosting performance through multimodal integration. In the early fusion stage, omic embeddings are projected onto WSI patches in latent-space, which generates embeddings that encapsulate per-patch molecular and morphological insights. This effectively incorporates omic information into the spatial representation of the WSI. These embeddings are then refined with a Multiple Instance Learning gated attention mechanism which attends to diagnostic patches. In the late fusion stage, we reintroduce the omic data by fusing it with slide-level omic-WSI embeddings using a Multimodal Outer Arithmetic Block (MOAB), which richly intermingles features from both modalities, capturing their correlations and complementarity. We demonstrate accurate CNS tumor subtyping across 20 fine-grained subtypes and validate our approach on benchmark datasets, achieving improved survival prediction on TCGA-BLCA and competitive performance on TCGA-BRCA compared to state-of-the-art methods. This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.

Paper Structure

This paper contains 19 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of the proposed MOAD-FNet framework. Data engineering and encoding for each modality are performed in the preprocessing block. The early fusion block (top) receives encoded inputs from both modalities, where omic data is concatenated to form a matrix $\mathbf{z}_{i,j}$ which is processed by an MLP encoder that learns a joint mapping, resulting in output $\mathbf{p}_{i,j}$. A gated attention via multiple instance learning (MIL) scores patch importance providing heatmap interpretability and producing a WSI feature $\mathbf{v}_i$. Next, the MOAD-FNet fusion block (bottom) reintroduces omic features $\mathbf{o}_i$ alongside the $\mathbf{v}_i$ feature representation from the early fusion block as input to the MOAB fusion block. This block performs four outer arithmetic operations to create fusion representations, which are further reduced with $f_\theta$ before being sent to the final subtyping classifier.
  • Figure 2: Distribution of training and testing data points across 20 classes/subtypes. The bar chart illustrates the number of patients allocated to training and testing sets for each class, highlighting the balance of data used for model development and evaluation.
  • Figure 3: Visual representation of attention heatmap generated by MOAD-FNet for a diffuse glioma, IDH-mutant and 1p19q-retained (astroglial type) tumor (Class 0). (A) The original histology slide is displayed. (B) The heatmap shows areas of high attention (red) and low attention (blue), with regions of diagnostic relevance highlighted. (C) Representative patches with high attention are bordered in red, potentially indicating hallmark features of astroglial differentiation and cellular atypia crucial for diagnosis. (D) Representative patches with low attention are bordered in blue, reflecting regions of low tumor infiltration. The color bar illustrates the attention scale from high (red) to low (blue).
  • Figure 4: Comparison of F1-scores across different fusion methods for glioma classification. The radar chart illustrates the F1-score performance across all 20 classes, highlighting distinct patterns for Early Fusion, MOAD-FNet, and Late Fusion. The bar plots zoom in on specific glioblastoma and glioma classes, showing class-level performance variations across fusion methods. The box plot provides a summary of F1-score distributions, showcasing the variability and consistency of each fusion method.
  • Figure 5: Comparison of confusion matrices and t-SNE visualizations for three fusion strategies: (A) MOAD-FNet, (B) Late Fusion, and (C) Early Fusion for brain tumor subtyping. The corresponding t-SNE plots are labeled as (A.1), (B.1), and (C.1), respectively.