Table of Contents
Fetching ...

ASP-VMUNet: Atrous Shifted Parallel Vision Mamba U-Net for Skin Lesion Segmentation

Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Changyu Zeng, Wenpei Bai, Guangliang Cheng

TL;DR

This work tackles skin lesion segmentation by marrying Vision Mamba biology with an Atrous Shifted Parallel U‑Net. The core ideas are the atrous scan to broaden receptive fields while keeping patches, the shift round to improve inter‑segment communication, and SE/SK blocks to fuse local and global information effectively within a parallel CNN–Mamba framework. Across ISIC16/17/18 and PH2, ASP‑VMUNet achieves state‑of‑the‑art results while maintaining a favorable parameter/compute footprint, as confirmed by comprehensive ablations of components, atrous step, and encoder depth. The approach offers a practical, plug‑and‑play path to enhance medical image segmentation with hybrid architectures that balance accuracy and efficiency.

Abstract

Skin lesion segmentation is a critical challenge in computer vision, and it is essential to separate pathological features from healthy skin for diagnostics accurately. Traditional Convolutional Neural Networks (CNNs) are limited by narrow receptive fields, and Transformers face significant computational burdens. This paper presents a novel skin lesion segmentation framework, the Atrous Shifted Parallel Vision Mamba UNet (ASP-VMUNet), which integrates the efficient and scalable Mamba architecture to overcome limitations in traditional CNNs and computationally demanding Transformers. The framework introduces an atrous scan technique that minimizes background interference and expands the receptive field, enhancing Mamba's scanning capabilities. Additionally, the inclusion of a Parallel Vision Mamba (PVM) layer and a shift round operation optimizes feature segmentation and fosters rich inter-segment information exchange. A supplementary CNN branch with a Selective-Kernel (SK) Block further refines the segmentation by blending local and global contextual information. Tested on four benchmark datasets (ISIC16/17/18 and PH2), ASP-VMUNet demonstrates superior performance in skin lesion segmentation, validated by comprehensive ablation studies. This approach not only advances medical image segmentation but also highlights the benefits of hybrid architectures in medical imaging technology. Our code is available at https://github.com/BaoBao0926/ASP-VMUNet/tree/main.

ASP-VMUNet: Atrous Shifted Parallel Vision Mamba U-Net for Skin Lesion Segmentation

TL;DR

This work tackles skin lesion segmentation by marrying Vision Mamba biology with an Atrous Shifted Parallel U‑Net. The core ideas are the atrous scan to broaden receptive fields while keeping patches, the shift round to improve inter‑segment communication, and SE/SK blocks to fuse local and global information effectively within a parallel CNN–Mamba framework. Across ISIC16/17/18 and PH2, ASP‑VMUNet achieves state‑of‑the‑art results while maintaining a favorable parameter/compute footprint, as confirmed by comprehensive ablations of components, atrous step, and encoder depth. The approach offers a practical, plug‑and‑play path to enhance medical image segmentation with hybrid architectures that balance accuracy and efficiency.

Abstract

Skin lesion segmentation is a critical challenge in computer vision, and it is essential to separate pathological features from healthy skin for diagnostics accurately. Traditional Convolutional Neural Networks (CNNs) are limited by narrow receptive fields, and Transformers face significant computational burdens. This paper presents a novel skin lesion segmentation framework, the Atrous Shifted Parallel Vision Mamba UNet (ASP-VMUNet), which integrates the efficient and scalable Mamba architecture to overcome limitations in traditional CNNs and computationally demanding Transformers. The framework introduces an atrous scan technique that minimizes background interference and expands the receptive field, enhancing Mamba's scanning capabilities. Additionally, the inclusion of a Parallel Vision Mamba (PVM) layer and a shift round operation optimizes feature segmentation and fosters rich inter-segment information exchange. A supplementary CNN branch with a Selective-Kernel (SK) Block further refines the segmentation by blending local and global contextual information. Tested on four benchmark datasets (ISIC16/17/18 and PH2), ASP-VMUNet demonstrates superior performance in skin lesion segmentation, validated by comprehensive ablation studies. This approach not only advances medical image segmentation but also highlights the benefits of hybrid architectures in medical imaging technology. Our code is available at https://github.com/BaoBao0926/ASP-VMUNet/tree/main.

Paper Structure

This paper contains 28 sections, 2 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Atrous scan process: First, pad the image to ensure the height and width are divisible by the atrous step. Then, sample the image into sub-images. Finally, flatten the sub-images into a 1D sequence for input into the Mamba block.
  • Figure 2: (a) shows an overview of ASP-VMUNet architecture: the network follows the U-Net framework with 6 stages in both encoder and decoder. The first stage is the CNN block and the subsequent is the ASP block in both the encoder and decoder. Skip connections incorporate shared-weighted Spatial Attention Blocks (SAB) and Channel Attention Blocks (CAB). (b) shows the process of SAB and CAB block, utilizing an attention mechanism to fuse multi-level and multi-scale features.
  • Figure 3: (a) shows the architecture of the Atrous Shifted Parallel (ASP) Block, which consists of a Mamba branch and a CNN branch, followed by a shared-weighted SE block and an SK block. This structure is repeated twice. The first Mamba block is the Atrous Parallel Vision Mamba (APVM) block, as shown in (b), while the second Mamba block is the Atrous Shifted Parallel Vision Mamba (ASPVM) block, incorporating the shift round operation, as depicted in (c).
  • Figure 4: The architecture of SE Block.
  • Figure 5: The architecture of SK Block.
  • ...and 4 more figures