Table of Contents
Fetching ...

UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

TL;DR

Skin lesion segmentation on mobile devices requires high accuracy with very low parameter counts and GFLOPs. UltraLight VM-UNet builds on Vision Mamba and introduces a Parallel Vision Mamba Layer to parallelize deep-feature processing while keeping the total channel count fixed. The work provides a detailed analysis of Mamba parameter influences, introduces a plug-and-play PVM Layer that reduces parameters by about $93.7\%$ and GFLOPs to $0.060$, and demonstrates competitive Dice scores on ISIC2017, ISIC2018, and PH2 datasets. This approach offers a practical path for integrating Mamba-based modules as lightweight building blocks in mobile medical imaging.

Abstract

Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .

UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

TL;DR

Skin lesion segmentation on mobile devices requires high accuracy with very low parameter counts and GFLOPs. UltraLight VM-UNet builds on Vision Mamba and introduces a Parallel Vision Mamba Layer to parallelize deep-feature processing while keeping the total channel count fixed. The work provides a detailed analysis of Mamba parameter influences, introduces a plug-and-play PVM Layer that reduces parameters by about and GFLOPs to , and demonstrates competitive Dice scores on ISIC2017, ISIC2018, and PH2 datasets. This approach offers a practical path for integrating Mamba-based modules as lightweight building blocks in mobile medical imaging.

Abstract

Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .
Paper Structure (20 sections, 18 equations, 7 figures, 6 tables)

This paper contains 20 sections, 18 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Visualization of the comparison results for the ISIC2017 dataset. X-axis corresponds to parameters and GFLOPs, the fewer the better. Y-axis corresponds to segmentation performance (DSC), the higher the better.
  • Figure 2: (a) The proposed UltraLight Vision Mamba UNet (UltraLight VM-UNet) model architecture. (b) Multilevel and multiscale information fusion module architecture for skip-connection paths.
  • Figure 3: (a) Architecture of the proposed Parallel Vision Mamba Layer (PVM Layer) method. Vision Mamba (VM) is composed by Mamba combined with residual connection and adjustment factor. (b) Mamba composition structure.
  • Figure 4: Settings for ablation experiments with Vision Mamba used in different parallel ways (PVM Layer).
  • Figure 5: Visual State Space (VSS) Block composition structure.
  • ...and 2 more figures