Table of Contents
Fetching ...

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification

Pan Zhang, Baochai Peng, Chaoran Lu, Quanjin Huang

TL;DR

This work tackles the challenge of robust land cover classification using RGB and SAR data by introducing ASANet, an asymmetric fusion framework. It leverages a Semantic Focusing Module to compute modality-specific feature weights and a Cascade Fusion Module to fuse information along channel and spatial dimensions, enabling effective exploitation of complementary features while mitigating noise. The authors present a new PIE-RGB-SAR dataset and demonstrate state-of-the-art performance on PIE-RGB-SAR, DDHR-SK, and WHU-OPT-SAR, including under cloudy/foggy conditions; ASANet runs at 48.7 FPS for 256×256 inputs. Overall, the results highlight the practical potential of asymmetric, cross-modal feature alignment for reliable multimodal remote sensing segmentation and provide a valuable dataset to advance RGB-SAR LCC research.

Abstract

Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256x256 pixels. The source code are available at https://github.com/whu-pzhang/ASANet

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification

TL;DR

This work tackles the challenge of robust land cover classification using RGB and SAR data by introducing ASANet, an asymmetric fusion framework. It leverages a Semantic Focusing Module to compute modality-specific feature weights and a Cascade Fusion Module to fuse information along channel and spatial dimensions, enabling effective exploitation of complementary features while mitigating noise. The authors present a new PIE-RGB-SAR dataset and demonstrate state-of-the-art performance on PIE-RGB-SAR, DDHR-SK, and WHU-OPT-SAR, including under cloudy/foggy conditions; ASANet runs at 48.7 FPS for 256×256 inputs. Overall, the results highlight the practical potential of asymmetric, cross-modal feature alignment for reliable multimodal remote sensing segmentation and provide a valuable dataset to advance RGB-SAR LCC research.

Abstract

Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256x256 pixels. The source code are available at https://github.com/whu-pzhang/ASANet

Paper Structure

This paper contains 27 sections, 6 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Comparison of different fusion frameworks: Features are categorized based on whether they engage in direct interaction within the network, being classified into two types: a) Indirect Interaction Segmentation Framework and b) Direct Interaction Segmentation Framework.
  • Figure 2: Overall Network Structure of ASANet: The SFM concentrates on independently complementary features of different modalities, and the CFM calibrates and aligns feature information. ($\oplus$ denotes the pixel-wise add operation)
  • Figure 3: Diagram of the structure of the SFM.
  • Figure 4: Diagram of the structure of the CFM.
  • Figure 5: PIE-RGB-SAR dataset: RGB image on the left, SAR image in the middle, and ground truth image on the right.
  • ...and 8 more figures