nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model
Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, Haofeng Li
TL;DR
This work tackles the difficulty of modeling long-range dependencies in 3D biomedical imaging by introducing nnMamba, a backbone that fuses CNNs with State Space Sequence Models ($SSMs$) via the MICCSS block. The approach includes an encoder–decoder segmentation framework with skip-scaling and a classification strategy that processes hierarchical multi-scale features as a sequence processed by $SSMs$. Across six datasets, nnMamba delivers state-of-the-art results for segmentation, classification, and landmark detection while maintaining high efficiency (e.g., ~15.55 MB parameters and 141.14 GFLOPS). The results demonstrate robust, scalable long-range context modeling with practical implications for clinical image analysis, and the authors provide code at the linked repository.
Abstract
In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimensional medical images.In this paper, we introduce nnMamba, a novel architecture that integrates the strengths of CNNs and the advanced long-range modeling capabilities of State Space Sequence Models (SSMs). Specifically, we propose the Mamba-In-Convolution with Channel-Spatial Siamese learning (MICCSS) block to model the long-range relationship of the voxels. For the dense prediction and classification tasks, we also design the channel-scaling and channel-sequential learning methods. Extensive experiments on 6 datasets demonstrate nnMamba's superiority over state-of-the-art methods in a suite of challenging tasks, including 3D image segmentation, classification, and landmark detection. nnMamba emerges as a robust solution, offering both the local representation ability of CNNs and the efficient global context processing of SSMs, setting a new standard for long-range dependency modeling in medical image analysis. Code is available at https://github.com/lhaof/nnMamba
