Table of Contents
Fetching ...

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu, Yaonan Wang, Hui Zhang, Ziyang Wang, Dong Dai

TL;DR

This work addresses the challenge of accurate lung tumor segmentation in PET-CT images under limited public data and multi-modal fusion limitations. It introduces PCLT20K, a large public dataset with 21,930 PET-CT image pairs from 605 patients, and CIPA, a cross-modal interactive perception network built on Mamba Selective State Space Models. CIPA features a channel-wise Rectification Module and a Dynamic Cross-Modality Interaction Module to exploit correlated information between PET and CT, achieving state-of-the-art segmentation on PCLT20K and the STS dataset, validated through extensive quantitative and qualitative analyses and ablations. The dataset and code are publicly available, providing a solid benchmark and advancing multi-modal medical image analysis for clinical workflows.

Abstract

Lung cancer is a leading cause of cancer-related deaths globally. PET-CT is crucial for imaging lung tumors, providing essential metabolic and anatomical information, while it faces challenges such as poor image quality, motion artifacts, and complex tumor morphology. Deep learning-based models are expected to address these problems, however, existing small-scale and private datasets limit significant performance improvements for these methods. Hence, we introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients. Furthermore, we propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images. Specifically, we design a channel-wise rectification module (CRM) that implements a channel state space block across multi-modal features to learn correlated representations and helps filter out modality-specific noise. A dynamic cross-modality interaction module (DCIM) is designed to effectively integrate position and context information, which employs PET images to learn regional position information and serves as a bridge to assist in modeling the relationships between local features of CT images. Extensive experiments on a comprehensive benchmark demonstrate the effectiveness of our CIPA compared to the current state-of-the-art segmentation methods. We hope our research can provide more exploration opportunities for medical image segmentation. The dataset and code are available at https://github.com/mj129/CIPA.

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

TL;DR

This work addresses the challenge of accurate lung tumor segmentation in PET-CT images under limited public data and multi-modal fusion limitations. It introduces PCLT20K, a large public dataset with 21,930 PET-CT image pairs from 605 patients, and CIPA, a cross-modal interactive perception network built on Mamba Selective State Space Models. CIPA features a channel-wise Rectification Module and a Dynamic Cross-Modality Interaction Module to exploit correlated information between PET and CT, achieving state-of-the-art segmentation on PCLT20K and the STS dataset, validated through extensive quantitative and qualitative analyses and ablations. The dataset and code are publicly available, providing a solid benchmark and advancing multi-modal medical image analysis for clinical workflows.

Abstract

Lung cancer is a leading cause of cancer-related deaths globally. PET-CT is crucial for imaging lung tumors, providing essential metabolic and anatomical information, while it faces challenges such as poor image quality, motion artifacts, and complex tumor morphology. Deep learning-based models are expected to address these problems, however, existing small-scale and private datasets limit significant performance improvements for these methods. Hence, we introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients. Furthermore, we propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images. Specifically, we design a channel-wise rectification module (CRM) that implements a channel state space block across multi-modal features to learn correlated representations and helps filter out modality-specific noise. A dynamic cross-modality interaction module (DCIM) is designed to effectively integrate position and context information, which employs PET images to learn regional position information and serves as a bridge to assist in modeling the relationships between local features of CT images. Extensive experiments on a comprehensive benchmark demonstrate the effectiveness of our CIPA compared to the current state-of-the-art segmentation methods. We hope our research can provide more exploration opportunities for medical image segmentation. The dataset and code are available at https://github.com/mj129/CIPA.

Paper Structure

This paper contains 25 sections, 10 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Some examples of the PET-CT images and the corresponding tumor picked from the PCLT20K dataset. The metabolic data from PET enhances sensitivity to lesion location, while the anatomical details from CT help precise localization and morphological characterization.
  • Figure 2: (a) Proportional statistics of the number of slices per case. (b) Statistics of the pixel counts in tumor regions.
  • Figure 3: Distribution of tumor center points and sizes. (a) Distribution of tumor in terms of center point coordinates. (b) Distribution of tumor in terms of width and height.
  • Figure 4: (a) Overall architecture of our proposed cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images. CIPA consists of: (1) a channel-wise rectification module (CRM) to learn shared representations; (2) a dynamic cross-modality interaction module (DCIM) to integrate position and context information effectively. (b) Illustration of CRM.
  • Figure 5: (a) Illustration of dynamic cross-modality interaction module (DCIM), which mainly includes a convolutional stem, local Mamba block, and region Mamba block. The black dashed arrows indicate bypassing region Mamba blocks. (b) The structure of Mamba block.
  • ...and 2 more figures