Table of Contents
Fetching ...

LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation

Weibin Liao, Yinghao Zhu, Xinyuan Wang, Chengwei Pan, Yasha Wang, Liantao Ma

TL;DR

This paper tackles the computational burden of high-capacity segmentation models by introducing LightM-UNet, a lightweight UNet variant that substitutes CNN/Transformer components with Mamba-based blocks to capture global context with linear complexity. The architecture employs Residual Vision Mamba Layers and a Vision State-Space Module within an encoder–bottleneck–decoder framework, enabling deep semantic feature extraction while maintaining a small parameter footprint (~1M). Extensive experiments on 2D Montgomery&Shenzhen and 3D LiTS demonstrate state-of-the-art performance with dramatic reductions in parameters and computation compared to nnU-Net and U-Mamba. The work substantiates the viability of Mamba as a lightweight backbone for medical image segmentation and highlights its potential for mobile health applications.

Abstract

UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Transformer architectures. Building upon this, we employ Mamba as a lightweight substitute for CNN and Transformer within UNet, aiming at tackling challenges stemming from computational resource limitations in real medical settings. To this end, we introduce the Lightweight Mamba UNet (LightM-UNet) that integrates Mamba and UNet in a lightweight framework. Specifically, LightM-UNet leverages the Residual Vision Mamba Layer in a pure Mamba fashion to extract deep semantic features and model long-range spatial dependencies, with linear computational complexity. Extensive experiments conducted on two real-world 2D/3D datasets demonstrate that LightM-UNet surpasses existing state-of-the-art literature. Notably, when compared to the renowned nnU-Net, LightM-UNet achieves superior segmentation performance while drastically reducing parameter and computation costs by 116x and 21x, respectively. This highlights the potential of Mamba in facilitating model lightweighting. Our code implementation is publicly available at https://github.com/MrBlankness/LightM-UNet.

LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation

TL;DR

This paper tackles the computational burden of high-capacity segmentation models by introducing LightM-UNet, a lightweight UNet variant that substitutes CNN/Transformer components with Mamba-based blocks to capture global context with linear complexity. The architecture employs Residual Vision Mamba Layers and a Vision State-Space Module within an encoder–bottleneck–decoder framework, enabling deep semantic feature extraction while maintaining a small parameter footprint (~1M). Extensive experiments on 2D Montgomery&Shenzhen and 3D LiTS demonstrate state-of-the-art performance with dramatic reductions in parameters and computation compared to nnU-Net and U-Mamba. The work substantiates the viability of Mamba as a lightweight backbone for medical image segmentation and highlights its potential for mobile health applications.

Abstract

UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Transformer architectures. Building upon this, we employ Mamba as a lightweight substitute for CNN and Transformer within UNet, aiming at tackling challenges stemming from computational resource limitations in real medical settings. To this end, we introduce the Lightweight Mamba UNet (LightM-UNet) that integrates Mamba and UNet in a lightweight framework. Specifically, LightM-UNet leverages the Residual Vision Mamba Layer in a pure Mamba fashion to extract deep semantic features and model long-range spatial dependencies, with linear computational complexity. Extensive experiments conducted on two real-world 2D/3D datasets demonstrate that LightM-UNet surpasses existing state-of-the-art literature. Notably, when compared to the renowned nnU-Net, LightM-UNet achieves superior segmentation performance while drastically reducing parameter and computation costs by 116x and 21x, respectively. This highlights the potential of Mamba in facilitating model lightweighting. Our code implementation is publicly available at https://github.com/MrBlankness/LightM-UNet.
Paper Structure (13 sections, 4 equations, 3 figures, 3 tables)

This paper contains 13 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (a) and (b) respectively show the visualization of comparative experimental results on LiTs bilic2023liver and Montgomery&Shenzhen jaeger2014two datasets. The central position of the marker indicates the performance of the model, while the size of the marker indicates the number of parameters of the model (larger size indicates a greater number of parameters). Colors in the legend represent the basic architecture these models applied.
  • Figure 2: The overall network architecture of LightM-UNet as well as the (a) Residual Vision Mamba Layer (RVM Layer), the (b) Vision State-Space Module (VSS Module).
  • Figure 3: Visualized segmentation examples of LiTS bilic2023liver (1st row, red parts indicate tumor and green parts indicate liver) and Montgomery&Shenzhen jaeger2014two (2nd row, red parts indicate lung) datasets. The white arrows point to the parts where significant differences exist in various segmentation results.