Table of Contents
Fetching ...

RMK RetinaNet: Rotated Multi-Kernel RetinaNet for Robust Oriented Object Detection in Remote Sensing Imagery

Huiran Sun

TL;DR

The proposed Rotated Multi-Kernel RetinaNet achieves performance comparable to state-of-the-art rotated object detectors while improving robustness in multi-scale and multi-orientation scenarios.

Abstract

Rotated object detection in remote sensing imagery is hindered by three major bottlenecks: non-adaptive receptive field utilization, inadequate long-range multi-scale feature fusion, and discontinuities in angle regression. To address these issues, we propose Rotated Multi-Kernel RetinaNet (RMK RetinaNet). First, we design a Multi-Scale Kernel (MSK) Block to strengthen adaptive multi-scale feature extraction. Second, we incorporate a Multi-Directional Contextual Anchor Attention (MDCAA) mechanism into the feature pyramid to enhance contextual modeling across scales and orientations. Third, we introduce a Bottom-up Path to preserve fine-grained spatial details that are often degraded during downsampling. Finally, we develop an Euler Angle Encoding Module (EAEM) to enable continuous and stable angle regression. Extensive experiments on DOTA-v1.0, HRSC2016, and UCAS-AOD show that RMK RetinaNet achieves performance comparable to state-of-the-art rotated object detectors while improving robustness in multi-scale and multi-orientation scenarios.

RMK RetinaNet: Rotated Multi-Kernel RetinaNet for Robust Oriented Object Detection in Remote Sensing Imagery

TL;DR

The proposed Rotated Multi-Kernel RetinaNet achieves performance comparable to state-of-the-art rotated object detectors while improving robustness in multi-scale and multi-orientation scenarios.

Abstract

Rotated object detection in remote sensing imagery is hindered by three major bottlenecks: non-adaptive receptive field utilization, inadequate long-range multi-scale feature fusion, and discontinuities in angle regression. To address these issues, we propose Rotated Multi-Kernel RetinaNet (RMK RetinaNet). First, we design a Multi-Scale Kernel (MSK) Block to strengthen adaptive multi-scale feature extraction. Second, we incorporate a Multi-Directional Contextual Anchor Attention (MDCAA) mechanism into the feature pyramid to enhance contextual modeling across scales and orientations. Third, we introduce a Bottom-up Path to preserve fine-grained spatial details that are often degraded during downsampling. Finally, we develop an Euler Angle Encoding Module (EAEM) to enable continuous and stable angle regression. Extensive experiments on DOTA-v1.0, HRSC2016, and UCAS-AOD show that RMK RetinaNet achieves performance comparable to state-of-the-art rotated object detectors while improving robustness in multi-scale and multi-orientation scenarios.
Paper Structure (18 sections, 7 equations, 5 figures, 4 tables)

This paper contains 18 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of RMK RetinaNet. The MSK Block consists of four layers, each constructed by the MSK Module(§\ref{['subsec:MSK Module']}). Only the bottommost MSK module does not perform downsampling; the remaining three modules all do.The outputs of the MSK Module are labeled as $M_1$, $M_2$, $M_3$, and $M_4$, which serve as inputs to both the MDCAA Module(§\ref{['subsec:MDCAA Module']}) and the Bottom-up Path Module(§\ref{['subsec:Bottom-up Path Module']}). $M_1$, $M_2$, $M_3$, and $M_4$ are fed into the MDCAA Module to generate $CP_2$, $CP_3$, and $CP_4$. Similarly, when fed into the Bottom-up Path, only the top-level output $N5$ is retained. Feature maps $CP_4$, $N5$, $CP_3$, and $CP_2$ are then concatenated with the $C_5$, $C_4$, and $C_3$ feature maps from Rotation RetinaNet at corresponding scales. Finally, the combined features are processed by the Euler Angle Encoding Module(§\ref{['subsec:Euler Angle Encoding Module']}) to produce the final detection results.
  • Figure 2: (a) The MSK Block consists of four MSK Modules. Here, we use $l$ to denote the $l$-th MSK Module(§\ref{['subsec:MSK Module']}), and the output of each MSK Module is denoted as $M_l$. (b)In the MDCAA Module(§\ref{['subsec:MDCAA Module']}), horizontal convolution is denoted as $H$-$Conv$, vertical convolution as $V$-$Conv$, convolution along the main diagonal direction as $Main$-$Diagonal$-$Conv$, and convolution along the anti-diagonal direction as $Anti$-$Diagonal$-$Conv$.
  • Figure 3: Visualization of detection results on the DOTA dataset, demonstrating the model's performance on large-scale, obliquely oriented, and densely arranged objects. The examples include various object categories such as planes, large vehicles (LV), small vehicles (SV), tennis courts (TC), harbors, and swimming pools (SP).
  • Figure 4: Qualitative comparison of detection results on the DOTA dataset. Top row: Rotation RetinaNet exhibits wrong detections and missed detections; Bottom row: our RMK RetinaNet successfully detects all instances, including ships, large vehicles (LV), small vehicles (SV), and storage tanks (ST). See §\ref{['Qualitative Results']} for details.
  • Figure 5: Performance comparison of object detection on the HRSC2016 dataset. We divide the HRSC2016 dataset into 4 categories, and our method achieves 68.77% mAP, significantly outperforming the baseline model.