Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Yice Cao; Chenchen Liu; Zhenhua Wu; Wenxin Yao; Liu Xiong; Jie Chen; Zhixiang Huang

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Yice Cao, Chenchen Liu, Zhenhua Wu, Wenxin Yao, Liu Xiong, Jie Chen, Zhixiang Huang

TL;DR

A novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet) is proposed, demonstrating that proposed CVMH-UNet achieves superior segmentation performance while maintaining low computational complexity, outperforming surpassing current leading-edge segmentation algorithms.

Abstract

As remote sensing imaging technology continues to advance and evolve, processing high-resolution and diversified satellite imagery to improve segmentation accuracy and enhance interpretation efficiency emerg as a pivotal area of investigation within the realm of remote sensing. Although segmentation algorithms based on CNNs and Transformers achieve significant progress in performance, balancing segmentation accuracy and computational complexity remains challenging, limiting their wide application in practical tasks. To address this, this paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet). This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions, while by incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features. Furthermore, to address the issue of limited discriminative power and the difficulty in achieving detailed fusion with direct skip connections, a multi-frequency multi-scale feature fusion block (MFMSBlock) is designed. This module introduces multi-frequency information through 2D discrete cosine transform (2D DCT) to enhance information utilization and provides additional scale local detail information through point-wise convolution branches. Finally, it aggregates multi-scale information along the channel dimension, achieving refined feature fusion. Findings from experiments conducted on renowned datasets of remote sensing imagery demonstrate that proposed CVMH-UNet achieves superior segmentation performance while maintaining low computational complexity, outperforming surpassing current leading-edge segmentation algorithms.

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

TL;DR

Abstract

Paper Structure (28 sections, 11 equations, 6 figures, 7 tables)

This paper contains 28 sections, 11 equations, 6 figures, 7 tables.

Introduction
Related Work
Vision State Space Models
Attention Mechanisms in Deep Learning
Skip Connections in Deep Learning
Methodology
Overall Architecture
CVSSBlock
MFMSBlock
Remote Sensing Image Segmentation Based on CVMH-UNet
Dataset and Experimental Setting
Datasets
ISPRS Vaihingen
ISPRS Potsdam
Experimental Setting
...and 13 more sections

Figures (6)

Figure 1: The overall architecture of CHVM-UNet.
Figure 2: (a) Overall architecture of the CVSSBlock, (b) Detailed structure of the Cross Scan, (c) Detailed structure of the E-FNN, (d) Connection method of the CVSSBlock.
Figure 3: Comparison of two different scanning methods, (a) Four scanning paths of SS2D, (b) Four scanning paths of CS2D.
Figure 4: (a) Structure of MFMSBlock (b) Structure of MFMS-AM.
Figure 5: Segmentation results of different methods on the ISPRS Vaihingen dataset.
...and 1 more figures

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

TL;DR

Abstract

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)