Table of Contents
Fetching ...

MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang

TL;DR

MMFusion addresses the challenge of diagnosing lymph node metastasis in esophageal squamous cell carcinoma by fusing CT-derived tumor and lymph node imaging with clinical, hematology, and radiomics data. The framework combines a multi-tissue masked relational learning (MMRL) strategy, a heterogeneous graph aggregation (HGA) module, and a conditional feature-guided diffusion (CFD) process to model multi-modal interactions while suppressing redundant information. Key contributions include a dataset of 1,354 ESCC cases, an architecture that uncovers prognostic relationships across tissues, and ablative evidence showing performance gains over state-of-the-art methods. The approach has potential to improve objective, data-driven decisions for ESCC management by leveraging robust multi-modal fusion and diffusion-based refinement.

Abstract

Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method. The code is available at https://github.com/wuchengyu123/MMFusion.

MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

TL;DR

MMFusion addresses the challenge of diagnosing lymph node metastasis in esophageal squamous cell carcinoma by fusing CT-derived tumor and lymph node imaging with clinical, hematology, and radiomics data. The framework combines a multi-tissue masked relational learning (MMRL) strategy, a heterogeneous graph aggregation (HGA) module, and a conditional feature-guided diffusion (CFD) process to model multi-modal interactions while suppressing redundant information. Key contributions include a dataset of 1,354 ESCC cases, an architecture that uncovers prognostic relationships across tissues, and ablative evidence showing performance gains over state-of-the-art methods. The approach has potential to improve objective, data-driven decisions for ESCC management by leveraging robust multi-modal fusion and diffusion-based refinement.

Abstract

Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method. The code is available at https://github.com/wuchengyu123/MMFusion.
Paper Structure (17 sections, 7 equations, 3 figures, 3 tables)

This paper contains 17 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview framework of our proposed MMFusion. Initially, the MMRL strategy is applied to extract correlation from image representations. Subsequently, we employ HGA to facilitate multi-modal feature interaction. Finally, the CFD method is utilized for feature redundancy elimination.
  • Figure 2: Architecture of our proposed Multi-tissue Masked Relational Representation Learning (MMRL) strategy.
  • Figure 3: Denoised multi-modal feature embedding output from HGA using t-SNE. Red and blue stand for metastasis and non-metastasis, respectively. As the time step encoding process advances, feature redundancy is gradually eliminated, resulting in a clear distribution and a lower DB score, which indicates that our model could effectively perform LNM diagnosis.