Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images
Zhifei Shi, Zongyao Yin, Sheng Chang, Xiao Yi, Xianchuan Yu
TL;DR
This work tackles efficient oriented object detection in aerial imagery under resource constraints by introducing ASFP to exploit high-resolution backbone features, OKM-CSP to boost small-object detail, and C2f-WTC and C2f-GDC as lightweight feature-extraction augmentations. The proposed framework achieves a compact 21.6M parameter count and competitive or superior mAP on DOTAv1.0 (78.32%), DIOR-R (67.32%), and UCAS-AOD (97.86%) benchmarks, outperforming several baselines while maintaining efficiency. The combination of ASFP, C2f-WTC, and C2f-GDC yields improved small-object detection with minimal computational overhead, making it well-suited for real-time aerial surveillance and other resource-constrained applications. Overall, the method advances practical oriented bounding-box detection in remote sensing by balancing accuracy and efficiency through multi-scale feature fusion and wavelet/Ghost-based convolutions.
Abstract
Achieving a balance between computational efficiency and detection accuracy in the realm of rotated bounding box object detection within aerial imagery is a significant challenge. While prior research has aimed at creating lightweight models that enhance computational performance and feature extraction, there remains a gap in the performance of these networks when it comes to the detection of small and multi-scale objects in remote sensing (RS) imagery. To address these challenges, we present a novel enhancement to the YOLOv8 model, tailored for oriented object detection tasks and optimized for environments with limited computational resources. Our model features a wavelet transform-based C2f module for capturing associative features and an Adaptive Scale Feature Pyramid (ASFP) module that leverages P2 layer details. Additionally, the incorporation of GhostDynamicConv significantly contributes to the model's lightweight nature, ensuring high efficiency in aerial imagery analysis. Featuring a parameter count of 21.6M, our approach provides a more efficient architectural design than DecoupleNet, which has 23.3M parameters, all while maintaining detection accuracy. On the DOTAv1.0 dataset, our model demonstrates a mean Average Precision (mAP) that is competitive with leading methods such as DecoupleNet. The model's efficiency, combined with its reduced parameter count, makes it a strong candidate for aerial object detection, particularly in resource-constrained environments.
