MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer
Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang
TL;DR
MMFusion addresses the challenge of diagnosing lymph node metastasis in esophageal squamous cell carcinoma by fusing CT-derived tumor and lymph node imaging with clinical, hematology, and radiomics data. The framework combines a multi-tissue masked relational learning (MMRL) strategy, a heterogeneous graph aggregation (HGA) module, and a conditional feature-guided diffusion (CFD) process to model multi-modal interactions while suppressing redundant information. Key contributions include a dataset of 1,354 ESCC cases, an architecture that uncovers prognostic relationships across tissues, and ablative evidence showing performance gains over state-of-the-art methods. The approach has potential to improve objective, data-driven decisions for ESCC management by leveraging robust multi-modal fusion and diffusion-based refinement.
Abstract
Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method. The code is available at https://github.com/wuchengyu123/MMFusion.
