Table of Contents
Fetching ...

A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation

Chaohan Wang, Yutong Xie, Qi Chen, Yuyin Zhou, Qi Wu

TL;DR

This work systematically evaluates Mamba versus Transformer architectures for 3D volumetric medical image segmentation across AMOS, TotalSegmentator, and BraTS. It introduces 3D depthwise convolutions and a multi-scale Mamba block (MSv4) to enhance long-range and multi-scale representations, and explores scanning strategies, finding Tri-scan most effectively preserves spatial relationships albeit at higher cost. A consolidated model, UlikeMamba_3dMT, combining 3D DWConv, MSv4, and Tri-scan, delivers competitive Dice scores while reducing computational load relative to leading baselines like nnUNet, CoTr, UNETR, SwinUNETR, and U-Mamba. Overall, the results position Mamba as a practical and efficient alternative to Transformers for high-resolution 3D medical image segmentation, with clear guidance on when and how to apply multi-scale and scanning strategies for best performance.

Abstract

Mamba, with its selective State Space Models (SSMs), offers a more computationally efficient solution than Transformers for long-range dependency modeling. However, there is still a debate about its effectiveness in high-resolution 3D medical image segmentation. In this study, we present a comprehensive investigation into Mamba's capabilities in 3D medical image segmentation by tackling three pivotal questions: Can Mamba replace Transformers? Can it elevate multi-scale representation learning? Is complex scanning necessary to unlock its full potential? We evaluate Mamba's performance across three large public benchmarks-AMOS, TotalSegmentator, and BraTS. Our findings reveal that UlikeMamba, a U-shape Mamba-based network, consistently surpasses UlikeTrans, a U-shape Transformer-based network, particularly when enhanced with custom-designed 3D depthwise convolutions, boosting accuracy and computational efficiency. Further, our proposed multi-scale Mamba block demonstrates superior performance in capturing both fine-grained details and global context, especially in complex segmentation tasks, surpassing Transformer-based counterparts. We also critically assess complex scanning strategies, finding that simpler methods often suffice, while our Tri-scan approach delivers notable advantages in the most challenging scenarios. By integrating these advancements, we introduce a new network for 3D medical image segmentation, positioning Mamba as a transformative force that outperforms leading models such as nnUNet, CoTr, and U-Mamba, offering competitive accuracy with superior computational efficiency. This study provides key insights into Mamba's unique advantages, paving the way for more efficient and accurate approaches to 3D medical imaging.

A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation

TL;DR

This work systematically evaluates Mamba versus Transformer architectures for 3D volumetric medical image segmentation across AMOS, TotalSegmentator, and BraTS. It introduces 3D depthwise convolutions and a multi-scale Mamba block (MSv4) to enhance long-range and multi-scale representations, and explores scanning strategies, finding Tri-scan most effectively preserves spatial relationships albeit at higher cost. A consolidated model, UlikeMamba_3dMT, combining 3D DWConv, MSv4, and Tri-scan, delivers competitive Dice scores while reducing computational load relative to leading baselines like nnUNet, CoTr, UNETR, SwinUNETR, and U-Mamba. Overall, the results position Mamba as a practical and efficient alternative to Transformers for high-resolution 3D medical image segmentation, with clear guidance on when and how to apply multi-scale and scanning strategies for best performance.

Abstract

Mamba, with its selective State Space Models (SSMs), offers a more computationally efficient solution than Transformers for long-range dependency modeling. However, there is still a debate about its effectiveness in high-resolution 3D medical image segmentation. In this study, we present a comprehensive investigation into Mamba's capabilities in 3D medical image segmentation by tackling three pivotal questions: Can Mamba replace Transformers? Can it elevate multi-scale representation learning? Is complex scanning necessary to unlock its full potential? We evaluate Mamba's performance across three large public benchmarks-AMOS, TotalSegmentator, and BraTS. Our findings reveal that UlikeMamba, a U-shape Mamba-based network, consistently surpasses UlikeTrans, a U-shape Transformer-based network, particularly when enhanced with custom-designed 3D depthwise convolutions, boosting accuracy and computational efficiency. Further, our proposed multi-scale Mamba block demonstrates superior performance in capturing both fine-grained details and global context, especially in complex segmentation tasks, surpassing Transformer-based counterparts. We also critically assess complex scanning strategies, finding that simpler methods often suffice, while our Tri-scan approach delivers notable advantages in the most challenging scenarios. By integrating these advancements, we introduce a new network for 3D medical image segmentation, positioning Mamba as a transformative force that outperforms leading models such as nnUNet, CoTr, and U-Mamba, offering competitive accuracy with superior computational efficiency. This study provides key insights into Mamba's unique advantages, paving the way for more efficient and accurate approaches to 3D medical imaging.

Paper Structure

This paper contains 17 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Mamba-based network (UlikeMamba) and Transformer-based network (UlikeTrans).
  • Figure 2: Left: detailed configurations of UlikeMamba_3d network. Here, 'K': kernel size of Conv, DWConv, or TransposeConv; 'C': number of channels; and 'S': stride. Right: Detailed configurations of UlikeTrans_SRA network. Here, 'R': reduction ratio of SRA; 'H': head number of SRA; and 'E': expansion ratio of FFN.
  • Figure 3: Four multi-scale modeling schemes for evaluating and comparing the long-range dependency modeling capabilities of Mamba and Transformers for multi-scale representation learning.
  • Figure 4: UlikeMamba_3d with different sequential scanning strategies.
  • Figure 5: Our proposed Mamba layer in UlikeMamba_3dMT, which modifies the original 1D depthwise convolution to 3D depthwise convolution, embraces a multi-scale strategy and incorporates tri-directional scanning to capture comprehensive spatial relationships in 3D volumetric data more effectively.
  • ...and 1 more figures