Table of Contents
Fetching ...

MM-UNet: Meta Mamba UNet for Medical Image Segmentation

Bin Xie, Yan Yan, Gady Agam

TL;DR

MM-UNet addresses the challenge of applying State Space Models to 3D medical image segmentation by introducing a hybrid CNN-Mamba design and a bi-directional scan strategy. The model places Mamba modules within residual connections after two CNN layers, mitigating high-variance inputs and discontinuities from flattening 3D data. It achieves state-of-the-art Dice scores on AMOS2022 (91.0%) and Synapse (87.1%), outperforming nnUNet and other SSM-based architectures. This work demonstrates a principled integration of sequence models into 3D medical imaging and provides practical guidelines for architectural and scan-design choices in future segmentation systems.

Abstract

State Space Models (SSMs) have recently demonstrated outstanding performance in long-sequence modeling, particularly in natural language processing. However, their direct application to medical image segmentation poses several challenges. SSMs, originally designed for 1D sequences, struggle with 3D spatial structures in medical images due to discontinuities introduced by flattening. Additionally, SSMs have difficulty fitting high-variance data, which is common in medical imaging. In this paper, we analyze the intrinsic limitations of SSMs in medical image segmentation and propose a unified U-shaped encoder-decoder architecture, Meta Mamba UNet (MM-UNet), designed to leverage the advantages of SSMs while mitigating their drawbacks. MM-UNet incorporates hybrid modules that integrate SSMs within residual connections, reducing variance and improving performance. Furthermore, we introduce a novel bi-directional scan order strategy to alleviate discontinuities when processing medical images. Extensive experiments on the AMOS2022 and Synapse datasets demonstrate the superiority of MM-UNet over state-of-the-art methods. MM-UNet achieves a Dice score of 91.0% on AMOS2022, surpassing nnUNet by 3.2%, and a Dice score of 87.1% on Synapse. These results confirm the effectiveness of integrating SSMs in medical image segmentation through architectural design optimizations.

MM-UNet: Meta Mamba UNet for Medical Image Segmentation

TL;DR

MM-UNet addresses the challenge of applying State Space Models to 3D medical image segmentation by introducing a hybrid CNN-Mamba design and a bi-directional scan strategy. The model places Mamba modules within residual connections after two CNN layers, mitigating high-variance inputs and discontinuities from flattening 3D data. It achieves state-of-the-art Dice scores on AMOS2022 (91.0%) and Synapse (87.1%), outperforming nnUNet and other SSM-based architectures. This work demonstrates a principled integration of sequence models into 3D medical imaging and provides practical guidelines for architectural and scan-design choices in future segmentation systems.

Abstract

State Space Models (SSMs) have recently demonstrated outstanding performance in long-sequence modeling, particularly in natural language processing. However, their direct application to medical image segmentation poses several challenges. SSMs, originally designed for 1D sequences, struggle with 3D spatial structures in medical images due to discontinuities introduced by flattening. Additionally, SSMs have difficulty fitting high-variance data, which is common in medical imaging. In this paper, we analyze the intrinsic limitations of SSMs in medical image segmentation and propose a unified U-shaped encoder-decoder architecture, Meta Mamba UNet (MM-UNet), designed to leverage the advantages of SSMs while mitigating their drawbacks. MM-UNet incorporates hybrid modules that integrate SSMs within residual connections, reducing variance and improving performance. Furthermore, we introduce a novel bi-directional scan order strategy to alleviate discontinuities when processing medical images. Extensive experiments on the AMOS2022 and Synapse datasets demonstrate the superiority of MM-UNet over state-of-the-art methods. MM-UNet achieves a Dice score of 91.0% on AMOS2022, surpassing nnUNet by 3.2%, and a Dice score of 87.1% on Synapse. These results confirm the effectiveness of integrating SSMs in medical image segmentation through architectural design optimizations.

Paper Structure

This paper contains 15 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a) Overview of our proposed MM-UNet architecture. (b) Experiments replacing meta blocks in MM-UNet with different modules, including pure CNN-based, hybrid, and pure SSM-based modules. Skip connections represent residual connections.
  • Figure 2: (1) Overview of our proposed MetaSSM architecture, where the MetaScan module is replaced with different scan orders. (2) Experimental evaluation of different MetaScan configurations.
  • Figure 3: The intensity distribution of feature maps inside and outside a residual connection from a pre-trained model, as well as from two sequential convolutional layers.
  • Figure 4: Experiments using S4 to fit flattened 2D medical images.
  • Figure 5: Visualization of attention maps of $\boldsymbol{Q}\boldsymbol{K}^{\boldsymbol{T}}$ for Mamba. Each attention map effectively captures image patterns across the temporal dimension, even when the 3D medical images are flattened into a 1D sequence as input for the MetaSSM blocks, highlighting the motivation for using SSMs in medical image segmentation.
  • ...and 2 more figures