Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

Huahui Yi; Xiaofei Wang; Kang Li; Chao Li

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

Huahui Yi, Xiaofei Wang, Kang Li, Chao Li

TL;DR

The paper addresses precision neuro-oncology by fusing histopathology and genomics through Unified Modeling Enhanced Multimodal Learning (UMEML), which employs a hierarchical attention structure to capture both shared and complementary information. It introduces two unimodal encoders (pathology and genomics) and a Unified Multimodal Decoder, augmented by a query-based cross-attention that clusters pathology patches into prototypes, a prototype assignment with a modularity loss $\mathcal{L}_{\text{modularity}} = -\frac{1}{2e}\big(\alpha \mathrm{Tr}(W (S^p)^T S^p) + \beta \mathrm{Tr}(W (S^g)^T S^g)\big)$ and a total loss $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{objective}} + \gamma \mathcal{L}_{\text{modularity}}$, plus a registration mechanism with learnable tokens. It demonstrates state-of-the-art results on TCGA GBM-LGG across glioma grading, classification, and survival (e.g., grading Acc 0.7756, AUC 0.9212; classification Acc 0.7514, AUC 0.9594; survival c-index 0.8396). Ablation studies confirm the importance of modularity loss, the Unified Multimodal Decoder, and the register tokens. This work advances multimodal fusion for precision neuro-oncology and suggests paths for handling missing modalities in the future.

Abstract

Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a hierarchical attention structure to effectively leverage shared and complementary features of both modalities of histology and genomics. Specifically, to mitigate unimodal bias from modality imbalance, we utilize a query-based cross-attention mechanism for prototype clustering in the pathology encoder. Our prototype assignment and modularity strategy are designed to align shared features and minimizes modality gaps. An additional registration mechanism with learnable tokens is introduced to enhance cross-modal feature integration and robustness in multimodal unified modeling. Our experiments demonstrate that our method surpasses previous state-of-the-art approaches in glioma diagnosis and prognosis tasks, underscoring its superiority in precision neuro-Oncology.

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

TL;DR

and a total loss

, plus a registration mechanism with learnable tokens. It demonstrates state-of-the-art results on TCGA GBM-LGG across glioma grading, classification, and survival (e.g., grading Acc 0.7756, AUC 0.9212; classification Acc 0.7514, AUC 0.9594; survival c-index 0.8396). Ablation studies confirm the importance of modularity loss, the Unified Multimodal Decoder, and the register tokens. This work advances multimodal fusion for precision neuro-oncology and suggests paths for handling missing modalities in the future.

Abstract

Paper Structure (11 sections, 9 equations, 3 figures, 2 tables)

This paper contains 11 sections, 9 equations, 3 figures, 2 tables.

Introduction
Methodology
Problem Formulation
Overall Structure
Prototype Assignment and Modularity
Registration Mechanism
Experiments
Datasets and Experiments Setting
Performance Evaluation
Ablation Studies
Conclusion

Figures (3)

Figure 1: Illustration of multimodal fusion methods.
Figure 2: The proposed UMEML framework. Histopathology-genomic pairs through two unimodal encoders to derive prototypes. The Assignment and Modularity Module refines prototypes using an Affinity Graph, concatenates them with noise-mitigating registers, and inputs them into a Unified Multimodal Decoder for unified representation modeling in downstream tasks.
Figure 3: Compare our method's ROC curves for glioma grading and classification and time-dependent AUC curves for survival prediction against other methods.

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

TL;DR

Abstract

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

Authors

TL;DR

Abstract

Table of Contents

Figures (3)