Table of Contents
Fetching ...

SAIP-Net: Enhancing Remote Sensing Image Segmentation via Spectral Adaptive Information Propagation

Zhongtao Wang, Xizhe Cao, Yisong Chen, Guoping Wang

TL;DR

This paper tackles semantic segmentation of remote sensing imagery, where large intra-class variance and boundary precision are major challenges. It introduces SAIP-Net, a frequency-aware framework that leverages Spectral Adaptive Information Propagation, combining a Transformer-based encoder with a Spectral Adaptive Feature Fusion decoder, Composite Dilated Convolutions, and a Learnable High-Pass Filter Stem. The approach jointly suppresses disruptive high-frequency noise within regions and sharpens boundary details while expanding receptive fields for better multi-scale context. Experimental results on Potsdam and LoveDA demonstrate improved intra-class consistency and boundary accuracy with competitive model complexity, and ablations validate the contribution of each spectral and architectural component, albeit with some limitations on low-texture classes. The work highlights the practical potential of integrating spectral priors with efficient context modeling for robust remote sensing segmentation.

Abstract

Semantic segmentation of remote sensing imagery demands precise spatial boundaries and robust intra-class consistency, challenging conventional hierarchical models. To address limitations arising from spatial domain feature fusion and insufficient receptive fields, this paper introduces SAIP-Net, a novel frequency-aware segmentation framework that leverages Spectral Adaptive Information Propagation. SAIP-Net employs adaptive frequency filtering and multi-scale receptive field enhancement to effectively suppress intra-class feature inconsistencies and sharpen boundary lines. Comprehensive experiments demonstrate significant performance improvements over state-of-the-art methods, highlighting the effectiveness of spectral-adaptive strategies combined with expanded receptive fields for remote sensing image segmentation.

SAIP-Net: Enhancing Remote Sensing Image Segmentation via Spectral Adaptive Information Propagation

TL;DR

This paper tackles semantic segmentation of remote sensing imagery, where large intra-class variance and boundary precision are major challenges. It introduces SAIP-Net, a frequency-aware framework that leverages Spectral Adaptive Information Propagation, combining a Transformer-based encoder with a Spectral Adaptive Feature Fusion decoder, Composite Dilated Convolutions, and a Learnable High-Pass Filter Stem. The approach jointly suppresses disruptive high-frequency noise within regions and sharpens boundary details while expanding receptive fields for better multi-scale context. Experimental results on Potsdam and LoveDA demonstrate improved intra-class consistency and boundary accuracy with competitive model complexity, and ablations validate the contribution of each spectral and architectural component, albeit with some limitations on low-texture classes. The work highlights the practical potential of integrating spectral priors with efficient context modeling for robust remote sensing segmentation.

Abstract

Semantic segmentation of remote sensing imagery demands precise spatial boundaries and robust intra-class consistency, challenging conventional hierarchical models. To address limitations arising from spatial domain feature fusion and insufficient receptive fields, this paper introduces SAIP-Net, a novel frequency-aware segmentation framework that leverages Spectral Adaptive Information Propagation. SAIP-Net employs adaptive frequency filtering and multi-scale receptive field enhancement to effectively suppress intra-class feature inconsistencies and sharpen boundary lines. Comprehensive experiments demonstrate significant performance improvements over state-of-the-art methods, highlighting the effectiveness of spectral-adaptive strategies combined with expanded receptive fields for remote sensing image segmentation.

Paper Structure

This paper contains 26 sections, 12 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of the challenges and motivations behind SAIP-Net. Remote sensing images often exhibit large intra-class variance, small inter-class differences, and irregular class layouts. To address these challenges, SAIP-Net uses the combination of four modules to improve intra-class consistency and enhance boundary accuracy, significantly improving the segmentation of complex structures in remote sensing images.
  • Figure 2: Overview of the proposed SAIP-Net architecture. The network combines: a Learnable High-Pass Filter Stem to enhance edge details, a Transformer Encoder that extracts global context via multi-stage learnable pooling, and a Frequency-Aware Decoder that fuses high- and low-level features using SAFF modules, while integrated Composite Dilated Convolution Layers expand the receptive field. These modules ultimately lead to result in improved intra-class consistency and enhanced boundary accuracy, thereby significantly improving the segmentation of complex structures in remote sensing images.
  • Figure 3: (a). Illustration of Spectral Adaptive Feature Fusion (SAFF) module. The module integrates high-level and low-level features using content guided low-pass and high-pass filters alongside spatial offset estimation. The structure of $\mathcal{F}^{\text{HP}}$ is enlarged at the bottom while $\mathcal{F}^{\text{LP}}$ shares a similar architecture. (b). Our module enhances intra-class consistency and boundary accuracy at the feature level, leading to better segmentation results.
  • Figure 4: Composite Dilated Convolution Framework: After a channel-mixing operation, we split the feature map into multiple parts processed with different dilation rates, and the resulting multi-scale features are concatenated, refined, and upsampled to reconstruct high-resolution segmentation maps.
  • Figure 5: (a). Overview of the proposed LhpfStem. The module is composed of a stack of Lhpf layers. In a Lhpf layer, the high-pass output is computed by subtracting the low-pass response features from the input, thereby enhancing edge details and fine structures. (b). Our module extracts key high-frequency features, sharpens feature boundaries, and improves segmentation performance.
  • ...and 4 more figures