Table of Contents
Fetching ...

nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model

Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, Haofeng Li

TL;DR

This work tackles the difficulty of modeling long-range dependencies in 3D biomedical imaging by introducing nnMamba, a backbone that fuses CNNs with State Space Sequence Models ($SSMs$) via the MICCSS block. The approach includes an encoder–decoder segmentation framework with skip-scaling and a classification strategy that processes hierarchical multi-scale features as a sequence processed by $SSMs$. Across six datasets, nnMamba delivers state-of-the-art results for segmentation, classification, and landmark detection while maintaining high efficiency (e.g., ~15.55 MB parameters and 141.14 GFLOPS). The results demonstrate robust, scalable long-range context modeling with practical implications for clinical image analysis, and the authors provide code at the linked repository.

Abstract

In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimensional medical images.In this paper, we introduce nnMamba, a novel architecture that integrates the strengths of CNNs and the advanced long-range modeling capabilities of State Space Sequence Models (SSMs). Specifically, we propose the Mamba-In-Convolution with Channel-Spatial Siamese learning (MICCSS) block to model the long-range relationship of the voxels. For the dense prediction and classification tasks, we also design the channel-scaling and channel-sequential learning methods. Extensive experiments on 6 datasets demonstrate nnMamba's superiority over state-of-the-art methods in a suite of challenging tasks, including 3D image segmentation, classification, and landmark detection. nnMamba emerges as a robust solution, offering both the local representation ability of CNNs and the efficient global context processing of SSMs, setting a new standard for long-range dependency modeling in medical image analysis. Code is available at https://github.com/lhaof/nnMamba

nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model

TL;DR

This work tackles the difficulty of modeling long-range dependencies in 3D biomedical imaging by introducing nnMamba, a backbone that fuses CNNs with State Space Sequence Models () via the MICCSS block. The approach includes an encoder–decoder segmentation framework with skip-scaling and a classification strategy that processes hierarchical multi-scale features as a sequence processed by . Across six datasets, nnMamba delivers state-of-the-art results for segmentation, classification, and landmark detection while maintaining high efficiency (e.g., ~15.55 MB parameters and 141.14 GFLOPS). The results demonstrate robust, scalable long-range context modeling with practical implications for clinical image analysis, and the authors provide code at the linked repository.

Abstract

In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimensional medical images.In this paper, we introduce nnMamba, a novel architecture that integrates the strengths of CNNs and the advanced long-range modeling capabilities of State Space Sequence Models (SSMs). Specifically, we propose the Mamba-In-Convolution with Channel-Spatial Siamese learning (MICCSS) block to model the long-range relationship of the voxels. For the dense prediction and classification tasks, we also design the channel-scaling and channel-sequential learning methods. Extensive experiments on 6 datasets demonstrate nnMamba's superiority over state-of-the-art methods in a suite of challenging tasks, including 3D image segmentation, classification, and landmark detection. nnMamba emerges as a robust solution, offering both the local representation ability of CNNs and the efficient global context processing of SSMs, setting a new standard for long-range dependency modeling in medical image analysis. Code is available at https://github.com/lhaof/nnMamba
Paper Structure (13 sections, 3 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 13 sections, 3 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: The nnMamba framework is designed for 3D biomedical tasks, focusing on dense prediction and classification. Our approach seeks to tackle the challenge of long-range modeling by leveraging the lightweight and robust long-range modeling capabilities of State Space Models.
  • Figure 2: Illustrative diagrams of the nnMamba framework architectures. (a) Presents the network structure for segmentation and landmark detection tasks. (b) Depicts the architecture tailored for classification tasks. Detailed structures of the blocks utilized within our networks are shown in (c), (d), and (e).
  • Figure 3: Visualization of segmentation predictions on the BraTS2023 dataset. From left to right, the columns display the MRI image, the prediction result of nnUNet, the prediction result of our nnMamba (MIC) model, the prediction result of our nnMamba (MICCSS) model, and the ground truth segmentation. This side-by-side comparison shows that our model can better capture the discontinue region of the tumor.
  • Figure 4: Visualization and comparison of segmentation predictions on the AMOS22 CT validation dataset. The four columns display, respectively, the CT image, the ground truth segmentation, the prediction result of nnUNet, and the prediction result of our nnMamba model. By capturing long-range dependencies, our nnMamba model effectively reduces over-segmentation and missed segmentation issues, particularly over long distances.