Table of Contents
Fetching ...

Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation

Feng Gao, Miao Fu, Jingchao Cao, Junyu Dong, Qian Du

TL;DR

AFENet addresses the challenge of adapting network parameters to diverse land-cover distributions in remote sensing image segmentation by introducing Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and Selective feature Fusion Module (SFM). AFSIM adaptively separates high- and low-frequency information using FFT-based analysis and an Adaptive Window-mask Module (AWM), while SFM selectively fuses global context with local details through cross-domain attention. The model, built on a ResNet-18 backbone and reinforced by Transformer-based fusion, achieves state-of-the-art results on Vaihingen, Potsdam, and LoveDA, and its components are validated via comprehensive ablations. The work highlights substantial gains in edge precision and multi-scale segmentation, with code availability to promote reproducibility and further research.

Abstract

Semantic segmentation of high-resolution remote sensing images plays a crucial role in land-use monitoring and urban planning. Recent remarkable progress in deep learning-based methods makes it possible to generate satisfactory segmentation results. However, existing methods still face challenges in adapting network parameters to various land cover distributions and enhancing the interaction between spatial and frequency domain features. To address these challenges, we propose the Adaptive Frequency Enhancement Network (AFENet), which integrates two key components: the Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and the Selective feature Fusion Module (SFM). AFSIM dynamically separates and modulates high- and low-frequency features according to the content of the input image. It adaptively generates two masks to separate high- and low-frequency components, therefore providing optimal details and contextual supplementary information for ground object feature representation. SFM selectively fuses global context and local detailed features to enhance the network's representation capability. Hence, the interactions between frequency and spatial features are further enhanced. Extensive experiments on three publicly available datasets demonstrate that the proposed AFENet outperforms state-of-the-art methods. In addition, we also validate the effectiveness of AFSIM and SFM in managing diverse land cover types and complex scenarios. Our codes are available at https://github.com/oucailab/AFENet.

Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation

TL;DR

AFENet addresses the challenge of adapting network parameters to diverse land-cover distributions in remote sensing image segmentation by introducing Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and Selective feature Fusion Module (SFM). AFSIM adaptively separates high- and low-frequency information using FFT-based analysis and an Adaptive Window-mask Module (AWM), while SFM selectively fuses global context with local details through cross-domain attention. The model, built on a ResNet-18 backbone and reinforced by Transformer-based fusion, achieves state-of-the-art results on Vaihingen, Potsdam, and LoveDA, and its components are validated via comprehensive ablations. The work highlights substantial gains in edge precision and multi-scale segmentation, with code availability to promote reproducibility and further research.

Abstract

Semantic segmentation of high-resolution remote sensing images plays a crucial role in land-use monitoring and urban planning. Recent remarkable progress in deep learning-based methods makes it possible to generate satisfactory segmentation results. However, existing methods still face challenges in adapting network parameters to various land cover distributions and enhancing the interaction between spatial and frequency domain features. To address these challenges, we propose the Adaptive Frequency Enhancement Network (AFENet), which integrates two key components: the Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and the Selective feature Fusion Module (SFM). AFSIM dynamically separates and modulates high- and low-frequency features according to the content of the input image. It adaptively generates two masks to separate high- and low-frequency components, therefore providing optimal details and contextual supplementary information for ground object feature representation. SFM selectively fuses global context and local detailed features to enhance the network's representation capability. Hence, the interactions between frequency and spatial features are further enhanced. Extensive experiments on three publicly available datasets demonstrate that the proposed AFENet outperforms state-of-the-art methods. In addition, we also validate the effectiveness of AFSIM and SFM in managing diverse land cover types and complex scenarios. Our codes are available at https://github.com/oucailab/AFENet.

Paper Structure

This paper contains 26 sections, 22 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Results of frequency separating in remote sensing images from the LoveDA dataset across rural (first row) and urban (second row) scenes. The first column shows the original images, and the second column presents the corresponding FFT spectra, reflecting the frequency characteristics of different scenes caused by variations in object types and texture distributions. The third and fourth columns display high-frequency and low-frequency feature maps, respectively, obtained via our dynamic window mechanism.
  • Figure 2: Overview of the proposed AFENet framework. The Adaptive Frequency Enhancement Block (AFEB) module acquires high-frequency and low-frequency features through an adaptive frequency separation mechanism and employs a cross-attention mechanism to facilitate the interaction between spatial and frequency domain features. The Transformer Block (TB) further refines and completes contextual information, while promoting deeper feature integration.
  • Figure 3: Structure of the selective feature fusion module (SFM). High-frequency $F_h$ and low-frequency $F_l$ features are processed through average pooling and max pooling, concatenated, and used to generate a weight mask for feature fusion.
  • Figure 4: Visualization of segmentation performance after removing different components from AFENet on the ISPRS Vaihingen dataset, focusing on magnified local areas. (a) represents AFENet; (b) represents AFENet without AFSIM-Low; (c) represents AFENet without AFSIM-High; (d) represents AFENet with AWM+Fixed; and (e) represents AFENet with SFM+Add.
  • Figure 5: Visualization of segmentation performance for different AFSIM configurations on the ISPRS Vaihingen dataset, focusing on magnified local areas. (a) Baseline. (b) Baseline with the low-frequency branch of AFSIM. (c) Baseline with the high-frequency branch of AFSIM. (d) Baseline with the complete AFSIM.
  • ...and 5 more figures