SFFNet: A Wavelet-Based Spatial and Frequency Domain Fusion Network for Remote Sensing Segmentation
Yunsong Yang, Genji Yuan, Jinjiang Li
TL;DR
SFFNet tackles remote sensing segmentation under large grayscale variations by fusing spatial features with frequency-domain information through a two-stage network. The Global and Local branches provide robust spatial modeling, while the Wavelet Transform Feature Decomposer adds low- and high-frequency cues, bridged by Multiscale Dual-Representation Alignment Filter for semantic alignment and feature selection. Empirical results on Vaihingen and Potsdam show state-of-the-art performance, with mIoU reaching $84.80\%$ and $87.73\%$ respectively, and improved convergence and robustness in shadowed and edge regions. The approach offers a practical, efficient pathway to more reliable RS segmentation by balancing spatial detail with frequency information, enabling better performance in challenging scenes.
Abstract
In order to fully utilize spatial information for segmentation and address the challenge of handling areas with significant grayscale variations in remote sensing segmentation, we propose the SFFNet (Spatial and Frequency Domain Fusion Network) framework. This framework employs a two-stage network design: the first stage extracts features using spatial methods to obtain features with sufficient spatial details and semantic information; the second stage maps these features in both spatial and frequency domains. In the frequency domain mapping, we introduce the Wavelet Transform Feature Decomposer (WTFD) structure, which decomposes features into low-frequency and high-frequency components using the Haar wavelet transform and integrates them with spatial features. To bridge the semantic gap between frequency and spatial features, and facilitate significant feature selection to promote the combination of features from different representation domains, we design the Multiscale Dual-Representation Alignment Filter (MDAF). This structure utilizes multiscale convolutions and dual-cross attentions. Comprehensive experimental results demonstrate that, compared to existing methods, SFFNet achieves superior performance in terms of mIoU, reaching 84.80% and 87.73% respectively.The code is located at https://github.com/yysdck/SFFNet.
