A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation

Chaohan Wang; Yutong Xie; Qi Chen; Yuyin Zhou; Qi Wu

A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation

Chaohan Wang, Yutong Xie, Qi Chen, Yuyin Zhou, Qi Wu

TL;DR

This work systematically evaluates Mamba versus Transformer architectures for 3D volumetric medical image segmentation across AMOS, TotalSegmentator, and BraTS. It introduces 3D depthwise convolutions and a multi-scale Mamba block (MSv4) to enhance long-range and multi-scale representations, and explores scanning strategies, finding Tri-scan most effectively preserves spatial relationships albeit at higher cost. A consolidated model, UlikeMamba_3dMT, combining 3D DWConv, MSv4, and Tri-scan, delivers competitive Dice scores while reducing computational load relative to leading baselines like nnUNet, CoTr, UNETR, SwinUNETR, and U-Mamba. Overall, the results position Mamba as a practical and efficient alternative to Transformers for high-resolution 3D medical image segmentation, with clear guidance on when and how to apply multi-scale and scanning strategies for best performance.

Abstract

Mamba, with its selective State Space Models (SSMs), offers a more computationally efficient solution than Transformers for long-range dependency modeling. However, there is still a debate about its effectiveness in high-resolution 3D medical image segmentation. In this study, we present a comprehensive investigation into Mamba's capabilities in 3D medical image segmentation by tackling three pivotal questions: Can Mamba replace Transformers? Can it elevate multi-scale representation learning? Is complex scanning necessary to unlock its full potential? We evaluate Mamba's performance across three large public benchmarks-AMOS, TotalSegmentator, and BraTS. Our findings reveal that UlikeMamba, a U-shape Mamba-based network, consistently surpasses UlikeTrans, a U-shape Transformer-based network, particularly when enhanced with custom-designed 3D depthwise convolutions, boosting accuracy and computational efficiency. Further, our proposed multi-scale Mamba block demonstrates superior performance in capturing both fine-grained details and global context, especially in complex segmentation tasks, surpassing Transformer-based counterparts. We also critically assess complex scanning strategies, finding that simpler methods often suffice, while our Tri-scan approach delivers notable advantages in the most challenging scenarios. By integrating these advancements, we introduce a new network for 3D medical image segmentation, positioning Mamba as a transformative force that outperforms leading models such as nnUNet, CoTr, and U-Mamba, offering competitive accuracy with superior computational efficiency. This study provides key insights into Mamba's unique advantages, paving the way for more efficient and accurate approaches to 3D medical imaging.

A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation

TL;DR

Abstract

A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)