Table of Contents
Fetching ...

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

Enze Zhu, Zhan Chen, Dingkai Wang, Hanru Shi, Xiaoxuan Liu, Lei Wang

TL;DR

UNetMamba is a UNet-like semantic segmentation model based on Mamba that incorporates a Mamba segmentation decoder (MSD) that can efficiently decode the complex information within high-resolution images, and a local supervision module (LSM) which is train-only but can significantly enhance the perception of local contents.

Abstract

Semantic segmentation of high-resolution remote sensing images is vital in downstream applications such as land-cover mapping, urban planning and disaster assessment.Existing Transformer-based methods suffer from the constraint between accuracy and efficiency, while the recently proposed Mamba is renowned for being efficient. Therefore, to overcome the dilemma, we propose UNetMamba, a UNet-like semantic segmentation model based on Mamba. It incorporates a mamba segmentation decoder (MSD) that can efficiently decode the complex information within high-resolution images, and a local supervision module (LSM), which is train-only but can significantly enhance the perception of local contents. Extensive experiments demonstrate that UNetMamba outperforms the state-of-the-art methods with mIoU increased by 0.87% on LoveDA and 0.39% on ISPRS Vaihingen, while achieving high efficiency through the lightweight design, less memory footprint and reduced computational cost. The source code is available at https://github.com/EnzeZhu2001/UNetMamba.

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

TL;DR

UNetMamba is a UNet-like semantic segmentation model based on Mamba that incorporates a Mamba segmentation decoder (MSD) that can efficiently decode the complex information within high-resolution images, and a local supervision module (LSM) which is train-only but can significantly enhance the perception of local contents.

Abstract

Semantic segmentation of high-resolution remote sensing images is vital in downstream applications such as land-cover mapping, urban planning and disaster assessment.Existing Transformer-based methods suffer from the constraint between accuracy and efficiency, while the recently proposed Mamba is renowned for being efficient. Therefore, to overcome the dilemma, we propose UNetMamba, a UNet-like semantic segmentation model based on Mamba. It incorporates a mamba segmentation decoder (MSD) that can efficiently decode the complex information within high-resolution images, and a local supervision module (LSM), which is train-only but can significantly enhance the perception of local contents. Extensive experiments demonstrate that UNetMamba outperforms the state-of-the-art methods with mIoU increased by 0.87% on LoveDA and 0.39% on ISPRS Vaihingen, while achieving high efficiency through the lightweight design, less memory footprint and reduced computational cost. The source code is available at https://github.com/EnzeZhu2001/UNetMamba.
Paper Structure (12 sections, 9 equations, 3 figures, 3 tables)

This paper contains 12 sections, 9 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Framework of the proposed UNetMamba. (a) Overall architecture of UNetMamba. (b) Visual State Space (VSS) block of Mamba Segmentation Decoder (MSD). (c) Local Supervision Module (LSM) Block of LSM.
  • Figure 2: Qualitative comparison on the LoveDA dataset at resolution of 1024 × 1024 pixels. (a) Origin, (b) Ground Truth, (c) the proposed UNetMamba, (d) BANet, (e) MANet, (f) DC-Swin, (g) UNetFormer, (h) E-PyramidMamba, (i) CM-UNet and (j) RS$^3$Mamba.
  • Figure 3: Qualitative comparison on the ISPRS Vaihingen dataset at resolution of 1024 × 1024 pixels. (a) Origin, (b) Ground Truth, (c) the proposed UNetMamba, (d) BANet, (e) MANet, (f) DC-Swin, (g) UNetFormer, (h) E-PyramidMamba, (i) CM-UNet and (j) RS$^3$Mamba.