Table of Contents
Fetching ...

Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation

Elena Mulero Ayllón, Linlin Shen, Pierangelo Veltri, Fabrizia Gelardi, Arturo Chiti, Paolo Soda, Matteo Tortora

TL;DR

The paper tackles accurate lung tumor segmentation by fusing anatomical CT and metabolic PET information. It introduces vMambaX, a lightweight dual-branch framework based on Visual Mamba that employs a Context-Gated Cross-Modal Perception (CGM) module to adaptively gate features across modalities. CGM, together with cross-modality interaction, yields superior PET-CT fusion, achieving state-of-the-art performance on the PCLT20K dataset with lower computational cost than baselines. The work demonstrates the practicality of adaptive cross-modal gating for scalable, efficient multimodal tumor analysis in lung cancer, and provides code for broader adoption.

Abstract

Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major challenge. In this study, we propose vMambaX, a lightweight multimodal framework integrating PET and CT scan images through a Context-Gated Cross-Modal Perception Module (CGM). Built on the Visual Mamba architecture, vMambaX adaptively enhances inter-modality feature interaction, emphasizing informative regions while suppressing noise. Evaluated on the PCLT20K dataset, the model outperforms baseline models while maintaining lower computational complexity. These results highlight the effectiveness of adaptive cross-modal gating for multimodal tumor segmentation and demonstrate the potential of vMambaX as an efficient and scalable framework for advanced lung cancer analysis. The code is available at https://github.com/arco-group/vMambaX.

Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation

TL;DR

The paper tackles accurate lung tumor segmentation by fusing anatomical CT and metabolic PET information. It introduces vMambaX, a lightweight dual-branch framework based on Visual Mamba that employs a Context-Gated Cross-Modal Perception (CGM) module to adaptively gate features across modalities. CGM, together with cross-modality interaction, yields superior PET-CT fusion, achieving state-of-the-art performance on the PCLT20K dataset with lower computational cost than baselines. The work demonstrates the practicality of adaptive cross-modal gating for scalable, efficient multimodal tumor analysis in lung cancer, and provides code for broader adoption.

Abstract

Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major challenge. In this study, we propose vMambaX, a lightweight multimodal framework integrating PET and CT scan images through a Context-Gated Cross-Modal Perception Module (CGM). Built on the Visual Mamba architecture, vMambaX adaptively enhances inter-modality feature interaction, emphasizing informative regions while suppressing noise. Evaluated on the PCLT20K dataset, the model outperforms baseline models while maintaining lower computational complexity. These results highlight the effectiveness of adaptive cross-modal gating for multimodal tumor segmentation and demonstrate the potential of vMambaX as an efficient and scalable framework for advanced lung cancer analysis. The code is available at https://github.com/arco-group/vMambaX.

Paper Structure

This paper contains 9 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the proposed vMambaX architecture (left) and detailed structure of the CGM mechanism (right).
  • Figure 2: Qualitative comparison of model segmentations, showing CT and PET inputs, ground truth, and predictions.