Table of Contents
Fetching ...

H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

TL;DR

The paper identifies limitations of conventional CNNs and ViTs for medical image segmentation and leverages state-space modeling with 2D-selective-scan to capture global context efficiently. It introduces High-order SS2D (H-SS2D) and High-order Visual State Space (H-VSS), embedding them into a UNet-like architecture to form H-vmunet. The authors demonstrate that H-vmunet achieves competitive segmentation performance on ISIC2017, Spleen, and CVC-ClinicDB while reducing parameters by $67.28\%$ compared to VM-UNet, due to reduced redundant information through higher-order interactions. This approach advances SS2D-based vision models, offering a memory-efficient, high-capacity framework for precise medical image segmentation with practical implications for clinical workflows.

Abstract

In the field of medical image segmentation, variant models based on Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) as the base modules have been very widely developed and applied. However, CNNs are often limited in their ability to deal with long sequences of information, while the low sensitivity of ViTs to local feature information and the problem of secondary computational complexity limit their development. Recently, the emergence of state-space models (SSMs), especially 2D-selective-scan (SS2D), has had an impact on the longtime dominance of traditional CNNs and ViTs as the foundational modules of visual neural networks. In this paper, we extend the adaptability of SS2D by proposing a High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. Among them, the proposed High-order 2D-selective-scan (H-SS2D) progressively reduces the introduction of redundant information during SS2D operations through higher-order interactions. In addition, the proposed Local-SS2D module improves the learning ability of local features of SS2D at each order of interaction. We conducted comparison and ablation experiments on three publicly available medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB), and the results all demonstrate the strong competitiveness of H-vmunet in medical image segmentation tasks. The code is available from https://github.com/wurenkai/H-vmunet .

H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

TL;DR

The paper identifies limitations of conventional CNNs and ViTs for medical image segmentation and leverages state-space modeling with 2D-selective-scan to capture global context efficiently. It introduces High-order SS2D (H-SS2D) and High-order Visual State Space (H-VSS), embedding them into a UNet-like architecture to form H-vmunet. The authors demonstrate that H-vmunet achieves competitive segmentation performance on ISIC2017, Spleen, and CVC-ClinicDB while reducing parameters by compared to VM-UNet, due to reduced redundant information through higher-order interactions. This approach advances SS2D-based vision models, offering a memory-efficient, high-capacity framework for precise medical image segmentation with practical implications for clinical workflows.

Abstract

In the field of medical image segmentation, variant models based on Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) as the base modules have been very widely developed and applied. However, CNNs are often limited in their ability to deal with long sequences of information, while the low sensitivity of ViTs to local feature information and the problem of secondary computational complexity limit their development. Recently, the emergence of state-space models (SSMs), especially 2D-selective-scan (SS2D), has had an impact on the longtime dominance of traditional CNNs and ViTs as the foundational modules of visual neural networks. In this paper, we extend the adaptability of SS2D by proposing a High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. Among them, the proposed High-order 2D-selective-scan (H-SS2D) progressively reduces the introduction of redundant information during SS2D operations through higher-order interactions. In addition, the proposed Local-SS2D module improves the learning ability of local features of SS2D at each order of interaction. We conducted comparison and ablation experiments on three publicly available medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB), and the results all demonstrate the strong competitiveness of H-vmunet in medical image segmentation tasks. The code is available from https://github.com/wurenkai/H-vmunet .
Paper Structure (18 sections, 15 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 15 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: (a) The proposed High-order Vision Mamba UNet (H-vmunet) model architecture. (b) Multi-level and multi-scale information fusion module architecture.
  • Figure 2: Image description for 2D-selective-scan.
  • Figure 3: (a) The proposed High-order visual state space (H-VSS) module architecture. (b) Overview of 1-order and 3-order 2D-selective-scan (H$_1$-SS2D and H$_3$-SS2D). (c) Overview of the proposed Local-SS2D module.
  • Figure 4: Visualization of segmentation graphs for comparison experiments.
  • Figure 5: Comparison of the parameters and memory usage of the proposed H-vmunet with the traditional High-order spatial interaction UNet (MHorUNet) model and the pure Vision Mamba UNet (VM-UNet) model.
  • ...and 2 more figures