Table of Contents
Fetching ...

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao

TL;DR

Inspired by the Mamba architecture, Vison Mamba-UNetV2 is proposed, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features.

Abstract

In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2.

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

TL;DR

Inspired by the Mamba architecture, Vison Mamba-UNetV2 is proposed, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features.

Abstract

In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2.
Paper Structure (12 sections, 8 equations, 2 figures, 5 tables)

This paper contains 12 sections, 8 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The overall architecture of Vision Mamba UNetV2 model, which consists of an Encoder module, SDI(semantics and detail infusion) peng2023u module, and an Decoder module
  • Figure 2: a. VSS Block as the backbone of VMUNetV2, and the SS2D is the core of VSS block b.( Semantics and Detail Infusion)SDI module consists of backbone's output features, an Attention module, output features of SDI are with the different size the same as backbone's output features.