LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network
Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen
TL;DR
LR-FPN targets the persistent issue in remote sensing object detection where shallow localization cues and fine-grained contextual interactions are underutilized by standard FPNs. By introducing the Shallow Position Information Extraction Module (SPIEM) and the Contextual Interaction Module (CIM), the approach preserves robust location cues from low-level features and fuses them across scales through spatial and channel interactions, including local and non-local strategies. Empirical results on DOTAV1.0 and HRSC2016 show consistent, state-of-the-art gains in AP50, AP75, and mAP, with ablation studies confirming the additive benefits of SPIEM and CIM and their variants. Overall, LR-FPN delivers a practical, plug-and-play enhancement that improves object localization and contextual understanding in remote sensing detection tasks, with future work extending to transformer-based backbones.
Abstract
Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.
