Table of Contents
Fetching ...

LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network

Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen

TL;DR

LR-FPN targets the persistent issue in remote sensing object detection where shallow localization cues and fine-grained contextual interactions are underutilized by standard FPNs. By introducing the Shallow Position Information Extraction Module (SPIEM) and the Contextual Interaction Module (CIM), the approach preserves robust location cues from low-level features and fuses them across scales through spatial and channel interactions, including local and non-local strategies. Empirical results on DOTAV1.0 and HRSC2016 show consistent, state-of-the-art gains in AP50, AP75, and mAP, with ablation studies confirming the additive benefits of SPIEM and CIM and their variants. Overall, LR-FPN delivers a practical, plug-and-play enhancement that improves object localization and contextual understanding in remote sensing detection tasks, with future work extending to transformer-based backbones.

Abstract

Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.

LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network

TL;DR

LR-FPN targets the persistent issue in remote sensing object detection where shallow localization cues and fine-grained contextual interactions are underutilized by standard FPNs. By introducing the Shallow Position Information Extraction Module (SPIEM) and the Contextual Interaction Module (CIM), the approach preserves robust location cues from low-level features and fuses them across scales through spatial and channel interactions, including local and non-local strategies. Empirical results on DOTAV1.0 and HRSC2016 show consistent, state-of-the-art gains in AP50, AP75, and mAP, with ablation studies confirming the additive benefits of SPIEM and CIM and their variants. Overall, LR-FPN delivers a practical, plug-and-play enhancement that improves object localization and contextual understanding in remote sensing detection tasks, with future work extending to transformer-based backbones.

Abstract

Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.
Paper Structure (18 sections, 12 equations, 6 figures, 4 tables)

This paper contains 18 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Common dataset and remote sensing dataset. In typical datasets, target objects are larger, while in remote sensing datasets, they are comparatively smaller.
  • Figure 2: The architecture of our detector. The outputs of the individual shallow position information extraction module (SPIEM) are designed to adaptively align with the scales and channels of the feature maps in the backbone network. In the construction of the extra layers, we leverage the $3\times3$ convolution network for execution.
  • Figure 3: The structure of shallow position information extraction module (SPIEM) and contextual interaction module (CIM). CIM, GAP and GMP represent the global average and max pooling, respectively.
  • Figure 4: The comparison with various FPNs.
  • Figure 5: The comparison with variants of CIM.
  • ...and 1 more figures