Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment

Nazanin Moradinasab; Saurav Sengupta; Jiebei Liu; Sana Syed; Donald E. Brown

Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment

Nazanin Moradinasab, Saurav Sengupta, Jiebei Liu, Sana Syed, Donald E. Brown

TL;DR

MoSARe tackles the challenge of incomplete multimodal healthcare data by unifying a Mixture of Experts, symmetric cross-modal alignment, and a decoupled reconstruction mechanism to robustly fuse histopathology, RNA-seq, and clinical text for cancer subtyping. The framework comprises modality-tailored preprocessing, adaptive cross-modal attention, and a reconstruction-enabled MoE fusion complemented by contrastive alignment (SymCL and MCL). Empirical results on TCGA BRCA, RCC, and NSCLC show MoSARe outperforming state-of-the-art methods in complete-data scenarios and exhibiting strong robustness to missing modalities, including scenarios where data are partially masked during training and testing. The work demonstrates practical impact for real-world, resource-limited healthcare settings by enabling reliable multimodal diagnosis despite incomplete records, and it provides interpretable insights via cross-modal attention heatmaps.

Abstract

Healthcare relies on multiple types of data, such as medical images, genetic information, and clinical records, to improve diagnosis and treatment. However, missing data is a common challenge due to privacy restrictions, cost, and technical issues, making many existing multi-modal models unreliable. To address this, we propose a new multi-model model called Mixture of Experts, Symmetric Aligning, and Reconstruction (MoSARe), a deep learning framework that handles incomplete multimodal data while maintaining high accuracy. MoSARe integrates expert selection, cross-modal attention, and contrastive learning to improve feature representation and decision-making. Our results show that MoSARe outperforms existing models in situations when the data is complete. Furthermore, it provides reliable predictions even when some data are missing. This makes it especially useful in real-world healthcare settings, including resource-limited environments. Our code is publicly available at https://github.com/NazaninMn/MoSARe.

Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment

TL;DR

Abstract

Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)