Table of Contents
Fetching ...

Multi-modal Evidential Fusion Network for Trustworthy PET/CT Tumor Segmentation

Yuxuan Qi, Li Lin, Jiajun Wang, Bin Zhang, Jingya Zhang

TL;DR

This work tackles accurate and reliable tumor segmentation in PET/CT by addressing modality uncertainty through a two-stage framework, Multi-modal Evidential Fusion Network (MEFN). It combines Cross-Modal Feature Learning (CFL) with GAN-based modality translation and Tumor Guided Attention to align PET and CT features, followed by Multi-modal Trustworthy Fusion (MTF) that leverages Dual-attention Feature Calibrating (DFC) and Dempster-Shafer Theory-based fusion (DTF), augmented by an Uncertainty Perceptual Loss. The method yields state-of-the-art DSC improvements on AutoPET ($+3.10\%$) and Hecktor ($+3.23\%$), while providing interpretable segmentation uncertainty via Dirichlet-evidence fusion, enabling radiologists to gauge trust in automated results. By explicitly modeling uncertainty and fusing modalities with evidence theory, MEFN offers a clinically meaningful approach to robust PET/CT segmentation in the presence of modality quality variations and domain gaps.

Abstract

Accurate tumor segmentation in PET/CT images is crucial for computer-aided cancer diagnosis and treatment. The primary challenge lies in effectively integrating the complementary information from PET and CT images. In clinical settings, the quality of PET and CT images often varies significantly, leading to uncertainty in the modality information extracted by networks. To address this challenge, we propose a novel Multi-modal Evidential Fusion Network (MEFN), which consists of two core stages: Cross-Modal Feature Learning (CFL) and Multi-modal Trustworthy Fusion (MTF). The CFL stage aligns features across different modalities and learns more robust feature representations, thereby alleviating the negative effects of domain gap. The MTF stage utilizes mutual attention mechanisms and an uncertainty calibrator to fuse modality features based on modality uncertainty and then fuse the segmentation results under the guidance of Dempster-Shafer Theory. Besides, a new uncertainty perceptual loss is introduced to force the model focusing on uncertain features and hence improve its ability to extract trusted modality information. Extensive comparative experiments are conducted on two publicly available PET/CT datasets to evaluate the performance of our proposed method whose results demonstrate that our MEFN significantly outperforms state-of-the-art methods with improvements of 3.10% and 3.23% in DSC scores on the AutoPET dataset and the Hecktor dataset, respectively. More importantly, our model can provide radiologists with credible uncertainty of the segmentation results for their decision in accepting or rejecting the automatic segmentation results, which is particularly important for clinical applications. Our code will be available at https://github.com/QPaws/MEFN.

Multi-modal Evidential Fusion Network for Trustworthy PET/CT Tumor Segmentation

TL;DR

This work tackles accurate and reliable tumor segmentation in PET/CT by addressing modality uncertainty through a two-stage framework, Multi-modal Evidential Fusion Network (MEFN). It combines Cross-Modal Feature Learning (CFL) with GAN-based modality translation and Tumor Guided Attention to align PET and CT features, followed by Multi-modal Trustworthy Fusion (MTF) that leverages Dual-attention Feature Calibrating (DFC) and Dempster-Shafer Theory-based fusion (DTF), augmented by an Uncertainty Perceptual Loss. The method yields state-of-the-art DSC improvements on AutoPET () and Hecktor (), while providing interpretable segmentation uncertainty via Dirichlet-evidence fusion, enabling radiologists to gauge trust in automated results. By explicitly modeling uncertainty and fusing modalities with evidence theory, MEFN offers a clinically meaningful approach to robust PET/CT segmentation in the presence of modality quality variations and domain gaps.

Abstract

Accurate tumor segmentation in PET/CT images is crucial for computer-aided cancer diagnosis and treatment. The primary challenge lies in effectively integrating the complementary information from PET and CT images. In clinical settings, the quality of PET and CT images often varies significantly, leading to uncertainty in the modality information extracted by networks. To address this challenge, we propose a novel Multi-modal Evidential Fusion Network (MEFN), which consists of two core stages: Cross-Modal Feature Learning (CFL) and Multi-modal Trustworthy Fusion (MTF). The CFL stage aligns features across different modalities and learns more robust feature representations, thereby alleviating the negative effects of domain gap. The MTF stage utilizes mutual attention mechanisms and an uncertainty calibrator to fuse modality features based on modality uncertainty and then fuse the segmentation results under the guidance of Dempster-Shafer Theory. Besides, a new uncertainty perceptual loss is introduced to force the model focusing on uncertain features and hence improve its ability to extract trusted modality information. Extensive comparative experiments are conducted on two publicly available PET/CT datasets to evaluate the performance of our proposed method whose results demonstrate that our MEFN significantly outperforms state-of-the-art methods with improvements of 3.10% and 3.23% in DSC scores on the AutoPET dataset and the Hecktor dataset, respectively. More importantly, our model can provide radiologists with credible uncertainty of the segmentation results for their decision in accepting or rejecting the automatic segmentation results, which is particularly important for clinical applications. Our code will be available at https://github.com/QPaws/MEFN.
Paper Structure (21 sections, 19 equations, 9 figures, 4 tables)

This paper contains 21 sections, 19 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Framework of our proposed MEFN which consists of CFL stage and MTF stage, both with two branches for PET and CT, respectively. For simplicity, $Enc$ represents the encoder $Enc^{CT}$ or $Enc^{PET}$, $Dec_{m}$ represents the modality decoder $Dec_{m}^{CT}$ or $Dec_{m}^{PET}$, $Dec_{t}$ represents the tumor decoder $Dec_{t}^{CT}$ or $Dec_{t}^{PET}$, $Dis_{m}$ represents the modality discriminator $Dis_{m}^{CT}$ or $Dis_{m}^{PET}$, $Dis_{t}$ represents the shared tumor discriminator.
  • Figure 2: The architecture of (a) Tumor Guided Attention in CFL stage and (b) Dual-attention Feature Calibrating module in MTF stage.
  • Figure 3: Schematic illustration of the Dempster-Shafer Theory-based Trustworthy Fusion.
  • Figure 4: Segmentation results (blue contour) from different multi-modal segmentation methods of CT sample images for three patients from the AutoPET dataset. The ground truth is displayed in red contour. Each row represents segmentation results for the same patient. The bottom right corner of each image displays the enlarged version of the region indicated by the purple box. Red arrows highlight the mis-segmented regions.
  • Figure 5: Segmentation results from different multi-modal segmentation methods of CT sample images for different patients on the Hecktor dataset. Blue and yellow contours represent the segmentation results of the GTVp and GTVn tasks, respectively. Red and green contours represent the ground truth of the GTVp and GTVn tasks, respectively. Each row represents the segmentation results of the same patient. The bottom right corner of each image displays the enlarged version of region indicated by the purple box. Red arrows highlight the mis-segmented regions.
  • ...and 4 more figures