Table of Contents
Fetching ...

Multi-scale direction-aware SAR object detection network via global information fusion

Mingxiang Cao, Weiying Xie, Jie Lei, Jiaqing Zhang, Daixun Li, Yunsong Li

TL;DR

SAR-Net is proposed, a novel framework specifically designed for global fusion of direction-aware information in SAR object detection, achieving state-of-the-art results on aircraft and ship datasets, confirming its generalization capability and robustness.

Abstract

Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR object detection. SAR-Net leverages two key innovations: the Unity Compensation Mechanism (UCM) and the Direction-aware Attention Module (DAM). UCM facilitates the establishment of complementary relationships among features across different scales, enabling efficient global information fusion and transmission. Additionally, DAM, through bidirectional attention polymerization, captures direction-aware information, effectively eliminating background interference. Extensive experiments demonstrate the effectiveness of SAR-Net, achieving state-of-the-art results on aircraft (SAR-AIRcraft-1.0) and ship datasets (SSDD, HRSID), confirming its generalization capability and robustness.

Multi-scale direction-aware SAR object detection network via global information fusion

TL;DR

SAR-Net is proposed, a novel framework specifically designed for global fusion of direction-aware information in SAR object detection, achieving state-of-the-art results on aircraft and ship datasets, confirming its generalization capability and robustness.

Abstract

Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR object detection. SAR-Net leverages two key innovations: the Unity Compensation Mechanism (UCM) and the Direction-aware Attention Module (DAM). UCM facilitates the establishment of complementary relationships among features across different scales, enabling efficient global information fusion and transmission. Additionally, DAM, through bidirectional attention polymerization, captures direction-aware information, effectively eliminating background interference. Extensive experiments demonstrate the effectiveness of SAR-Net, achieving state-of-the-art results on aircraft (SAR-AIRcraft-1.0) and ship datasets (SSDD, HRSID), confirming its generalization capability and robustness.
Paper Structure (32 sections, 17 equations, 11 figures, 10 tables)

This paper contains 32 sections, 17 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Comparison of information fusion and transmission between our method and existing methods. In the figure, U represents unity operation, C represents feature compensation, and F represents feature fusion. Our perspective enables each layer to focus on global information, while existing fusion methods only focus on local information, resulting in information loss.
  • Figure 2: Overview of the proposed SAR-Net framework, here, the DAVgg Block embeds our designed direction-aware attention module, and the DA Block consists of DAVgg Blocks, combined to extract effective direction-aware information. $Ci$, $Ki$, and $Bi$ represent the multi-scale features at different stages.
  • Figure 3: Overview of backbone components. DAM represents the Direction-aware Attention Module.
  • Figure 4: Multi-level Fusion Module at both shallow and deep levels, here, MHA and FFN denote Multi-Head Attention and Feed-Forward Network, respectively, with N of them combined in transformer blocks. The Multi-scale Alignment process is also depicted on the left side of the diagram, and the concatenation operation is performed along the channel dimension.
  • Figure 5: Architecture of Multi-feature Embedding Module, where Resize represents the average pooling operation or bilinear interpolation.
  • ...and 6 more figures