Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei; Chenyu Lin; Yu Qiu; Yaonan Wang; Hui Zhang; Ziyang Wang; Dong Dai

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu, Yaonan Wang, Hui Zhang, Ziyang Wang, Dong Dai

TL;DR

This work addresses the challenge of accurate lung tumor segmentation in PET-CT images under limited public data and multi-modal fusion limitations. It introduces PCLT20K, a large public dataset with 21,930 PET-CT image pairs from 605 patients, and CIPA, a cross-modal interactive perception network built on Mamba Selective State Space Models. CIPA features a channel-wise Rectification Module and a Dynamic Cross-Modality Interaction Module to exploit correlated information between PET and CT, achieving state-of-the-art segmentation on PCLT20K and the STS dataset, validated through extensive quantitative and qualitative analyses and ablations. The dataset and code are publicly available, providing a solid benchmark and advancing multi-modal medical image analysis for clinical workflows.

Abstract

Lung cancer is a leading cause of cancer-related deaths globally. PET-CT is crucial for imaging lung tumors, providing essential metabolic and anatomical information, while it faces challenges such as poor image quality, motion artifacts, and complex tumor morphology. Deep learning-based models are expected to address these problems, however, existing small-scale and private datasets limit significant performance improvements for these methods. Hence, we introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients. Furthermore, we propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images. Specifically, we design a channel-wise rectification module (CRM) that implements a channel state space block across multi-modal features to learn correlated representations and helps filter out modality-specific noise. A dynamic cross-modality interaction module (DCIM) is designed to effectively integrate position and context information, which employs PET images to learn regional position information and serves as a bridge to assist in modeling the relationships between local features of CT images. Extensive experiments on a comprehensive benchmark demonstrate the effectiveness of our CIPA compared to the current state-of-the-art segmentation methods. We hope our research can provide more exploration opportunities for medical image segmentation. The dataset and code are available at https://github.com/mj129/CIPA.

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

TL;DR

Abstract

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)