Table of Contents
Fetching ...

FSDENet: A Frequency and Spatial Domains based Detail Enhancement Network for Remote Sensing Semantic Segmentation

Jiahao Fu, Yinfeng Yu, Liejun Wang

TL;DR

FSDENet addresses semantic segmentation in high-resolution remote sensing imagery plagued by grayscale variations that blur object boundaries. It fuses spatial-domain feature processing with global frequency-domain cues using FFT and Haar wavelet transforms, enabling edge-aware, boundary-robust segmentation. The method introduces four modules—MASF, CAGF, FFDP, and HWDE—for cross-scale fusion, global interaction, and frequency-domain detail enhancement. Across LoveDA, Vaihingen, Potsdam, and iSAID, FSDENet achieves state-of-the-art performance with competitive computational cost, demonstrating the practical value of dual-domain detail enhancement for remote sensing tasks.

Abstract

To fully leverage spatial information for remote sensing image segmentation and address semantic edge ambiguities caused by grayscale variations (e.g., shadows and low-contrast regions), we propose the Frequency and Spatial Domains based Detail Enhancement Network (FSDENet). Our framework employs spatial processing methods to extract rich multi-scale spatial features and fine-grained semantic details. By effectively integrating global and frequency-domain information through the Fast Fourier Transform (FFT) in global mappings, the model's capability to discern global representations under grayscale variations is significantly strengthened. Additionally, we utilize Haar wavelet transform to decompose features into high- and low-frequency components, leveraging their distinct sensitivity to edge information to refine boundary segmentation. The model achieves dual-domain synergy by integrating spatial granularity with frequency-domain edge sensitivity, substantially improving segmentation accuracy in boundary regions and grayscale transition zones. Comprehensive experimental results demonstrate that FSDENet achieves state-of-the-art (SOTA) performance on four widely adopted datasets: LoveDA, Vaihingen, Potsdam, and iSAID.

FSDENet: A Frequency and Spatial Domains based Detail Enhancement Network for Remote Sensing Semantic Segmentation

TL;DR

FSDENet addresses semantic segmentation in high-resolution remote sensing imagery plagued by grayscale variations that blur object boundaries. It fuses spatial-domain feature processing with global frequency-domain cues using FFT and Haar wavelet transforms, enabling edge-aware, boundary-robust segmentation. The method introduces four modules—MASF, CAGF, FFDP, and HWDE—for cross-scale fusion, global interaction, and frequency-domain detail enhancement. Across LoveDA, Vaihingen, Potsdam, and iSAID, FSDENet achieves state-of-the-art performance with competitive computational cost, demonstrating the practical value of dual-domain detail enhancement for remote sensing tasks.

Abstract

To fully leverage spatial information for remote sensing image segmentation and address semantic edge ambiguities caused by grayscale variations (e.g., shadows and low-contrast regions), we propose the Frequency and Spatial Domains based Detail Enhancement Network (FSDENet). Our framework employs spatial processing methods to extract rich multi-scale spatial features and fine-grained semantic details. By effectively integrating global and frequency-domain information through the Fast Fourier Transform (FFT) in global mappings, the model's capability to discern global representations under grayscale variations is significantly strengthened. Additionally, we utilize Haar wavelet transform to decompose features into high- and low-frequency components, leveraging their distinct sensitivity to edge information to refine boundary segmentation. The model achieves dual-domain synergy by integrating spatial granularity with frequency-domain edge sensitivity, substantially improving segmentation accuracy in boundary regions and grayscale transition zones. Comprehensive experimental results demonstrate that FSDENet achieves state-of-the-art (SOTA) performance on four widely adopted datasets: LoveDA, Vaihingen, Potsdam, and iSAID.

Paper Structure

This paper contains 19 sections, 28 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: The figure illustrates the current challenges of remote sensing image segmentation: facing regions with large grey scale changes, such as shadows and low-contrast regions with obvious semantic ambiguities, it isn't easy to segment accurately. The first line is a local zoomed version of the original image, the second line corresponds to ground-truth labels (GT), the third line is the FT-UNetFormer segmentation result, the fourth line is the SFFNet segmentation result of the latest SOTA method, and the fifth line is the FSDENet segmentation result. It can be seen from the results that FT-UNetFormer, which only uses spatial information, performs poorly in dealing with shaded, low-contrast regions (e.g., the car is obscured by shadows, causing the low-contrast boundary to be inconspicuous). SFFNet, which adds frequency-domain information, significantly improves such problems. Our method makes full use of frequency-domain information to achieve better results.
  • Figure 2: The overall network architecture of our proposed FSDENet. Specifically, using ConvNeXt-Small to extract multi-scale features, unifying the extracted features to a scale size of $X_2$, using MASF for receptive field alignment of features at different scales, using CAGF for global information supplementation and feature interactions, using FFDP to introduce frequency-domain information in the global information efficiently, and finally fusing it with information after detail enhancement via HWDE. The final segmentation result is generated by the segmentation head.
  • Figure 3: An illustration of the FFDP Block.
  • Figure 4: An illustration of the MASF Block
  • Figure 5: An illustration of the CAGF Block.
  • ...and 8 more figures