EDNet: Edge-Optimized Small Target Detection in UAV Imagery -- Faster Context Attention, Better Feature Fusion, and Hardware Acceleration
Zhifan Song, Yuan Zhang, Abd Al Rahman M. Abu Ebayyeh
TL;DR
EDNet addresses the challenge of small-target detection in UAV imagery by delivering an edge-optimized YOLOv10-based framework with architectural innovations—C2f-FCA backbone, XSmall detection head, and Cross Concat Strategy—plus the WIoUv3 loss for robust bounding-box regression. Seven scalable variants enable real-time edge deployment across devices, achieving up to a $5.6\%$ gain in $mAP@50$ while using far fewer parameters, and running at $16$–$55$ FPS on an iPhone 12. Through extensive experiments on VisDrone and ablation studies, EDNet demonstrates consistent superiority over state-of-the-art YOLO and transformer-based models, with strong edge-CPU applicability and hardware-accelerated deployment via CoreML, INT8 quantization, and FP16 runtime. The work provides a practical, scalable solution for UAV-based small-target detection with clear guidance for deployment on mobile and embedded platforms.
Abstract
Detecting small targets in drone imagery is challenging due to low resolution, complex backgrounds, and dynamic scenes. We propose EDNet, a novel edge-target detection framework built on an enhanced YOLOv10 architecture, optimized for real-time applications without post-processing. EDNet incorporates an XSmall detection head and a Cross Concat strategy to improve feature fusion and multi-scale context awareness for detecting tiny targets in diverse environments. Our unique C2f-FCA block employs Faster Context Attention to enhance feature extraction while reducing computational complexity. The WIoU loss function is employed for improved bounding box regression. With seven model sizes ranging from Tiny to XL, EDNet accommodates various deployment environments, enabling local real-time inference and ensuring data privacy. Notably, EDNet achieves up to a 5.6% gain in mAP@50 with significantly fewer parameters. On an iPhone 12, EDNet variants operate at speeds ranging from 16 to 55 FPS, providing a scalable and efficient solution for edge-based object detection in challenging drone imagery. The source code and pre-trained models are available at: https://github.com/zsniko/EDNet.
