Table of Contents
Fetching ...

FGAA-FPN: Foreground-Guided Angle-Aware Feature Pyramid Network for Oriented Object Detection

Jialin Ma

TL;DR

FGAA-FPN introduces a foreground-guided and angle-aware feature pyramid network for oriented object detection in remote sensing imagery. By applying FGFM at low pyramid levels to strengthen object regions and AAMHA at high levels to enforce orientation-consistent feature interaction, the approach yields state-of-the-art results on DOTA v1.0 ($mAP=75.5\%$) and DOTA v1.5 ($mAP=68.3\%$). Ablation studies confirm that FGFM and AAMHA provide complementary gains and that their hierarchical placement is crucial for performance and efficiency. The method demonstrates strong generalization across detectors, improving robustness in cluttered scenes with diverse object orientations. The work highlights the value of explicit foreground priors and geometry-aware fusion in multi-scale feature representation for remote sensing detection.

Abstract

With the increasing availability of high-resolution remote sensing and aerial imagery, oriented object detection has become a key capability for geographic information updating, maritime surveillance, and disaster response. However, it remains challenging due to cluttered backgrounds, severe scale variation, and large orientation changes. Existing approaches largely improve performance through multi-scale feature fusion with feature pyramid networks or contextual modeling with attention, but they often lack explicit foreground modeling and do not leverage geometric orientation priors, which limits feature discriminability. To overcome these limitations, we propose FGAA-FPN, a Foreground-Guided Angle-Aware Feature Pyramid Network for oriented object detection. FGAA-FPN is built on a hierarchical functional decomposition that accounts for the distinct spatial resolution and semantic abstraction across pyramid levels, thereby strengthening multi-scale representations. Concretely, a Foreground-Guided Feature Modulation module learns foreground saliency under weak supervision to enhance object regions and suppress background interference in low-level features. In parallel, an Angle-Aware Multi-Head Attention module encodes relative orientation relationships to guide global interactions among high-level semantic features. Extensive experiments on DOTA v1.0 and DOTA v1.5 demonstrate that FGAA-FPN achieves state-of-the-art results, reaching 75.5% and 68.3% mAP, respectively.

FGAA-FPN: Foreground-Guided Angle-Aware Feature Pyramid Network for Oriented Object Detection

TL;DR

FGAA-FPN introduces a foreground-guided and angle-aware feature pyramid network for oriented object detection in remote sensing imagery. By applying FGFM at low pyramid levels to strengthen object regions and AAMHA at high levels to enforce orientation-consistent feature interaction, the approach yields state-of-the-art results on DOTA v1.0 () and DOTA v1.5 (). Ablation studies confirm that FGFM and AAMHA provide complementary gains and that their hierarchical placement is crucial for performance and efficiency. The method demonstrates strong generalization across detectors, improving robustness in cluttered scenes with diverse object orientations. The work highlights the value of explicit foreground priors and geometry-aware fusion in multi-scale feature representation for remote sensing detection.

Abstract

With the increasing availability of high-resolution remote sensing and aerial imagery, oriented object detection has become a key capability for geographic information updating, maritime surveillance, and disaster response. However, it remains challenging due to cluttered backgrounds, severe scale variation, and large orientation changes. Existing approaches largely improve performance through multi-scale feature fusion with feature pyramid networks or contextual modeling with attention, but they often lack explicit foreground modeling and do not leverage geometric orientation priors, which limits feature discriminability. To overcome these limitations, we propose FGAA-FPN, a Foreground-Guided Angle-Aware Feature Pyramid Network for oriented object detection. FGAA-FPN is built on a hierarchical functional decomposition that accounts for the distinct spatial resolution and semantic abstraction across pyramid levels, thereby strengthening multi-scale representations. Concretely, a Foreground-Guided Feature Modulation module learns foreground saliency under weak supervision to enhance object regions and suppress background interference in low-level features. In parallel, an Angle-Aware Multi-Head Attention module encodes relative orientation relationships to guide global interactions among high-level semantic features. Extensive experiments on DOTA v1.0 and DOTA v1.5 demonstrate that FGAA-FPN achieves state-of-the-art results, reaching 75.5% and 68.3% mAP, respectively.
Paper Structure (31 sections, 26 equations, 7 figures, 3 tables)

This paper contains 31 sections, 26 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overall architecture of the proposed FGAA-FPN. Foreground-guided feature modulation is applied at lower pyramid levels to suppress background interference, while angle-aware feature interaction is introduced at higher levels to enhance orientation modeling. This hierarchical design enables effective integration of foreground discrimination and directional reasoning for oriented object detection.
  • Figure 2: Overall structure of FGFM. FGFM takes pyramid features as input and first predicts a foreground probability map through a lightweight estimation branch. The predicted foreground confidence is then calibrated and combined with the original features to generate foreground-guided modulation weights. Finally, these weights are applied to reweight the input features, producing foreground-enhanced representations for subsequent detection.
  • Figure 3: Overall structure of AAMHA. AAMHA applies multi-head self-attention to pyramid features by projecting them into query, key, and value representations. For each attention head, a learnable orientation prototype is introduced to capture a specific directional preference. Based on normalized relative spatial directions between feature locations, an orientation bias is computed and injected into the attention logits, guiding feature interactions toward direction-consistent responses. The attention outputs from all heads are then aggregated and reshaped back to the original feature space, producing orientation-aware features for subsequent prediction.
  • Figure 4: SOTA comparison on DOTA v1.0.
  • Figure 5: SOTA comparison on DOTA v1.5.
  • ...and 2 more figures