Table of Contents
Fetching ...

EDNet: Edge-Optimized Small Target Detection in UAV Imagery -- Faster Context Attention, Better Feature Fusion, and Hardware Acceleration

Zhifan Song, Yuan Zhang, Abd Al Rahman M. Abu Ebayyeh

TL;DR

EDNet addresses the challenge of small-target detection in UAV imagery by delivering an edge-optimized YOLOv10-based framework with architectural innovations—C2f-FCA backbone, XSmall detection head, and Cross Concat Strategy—plus the WIoUv3 loss for robust bounding-box regression. Seven scalable variants enable real-time edge deployment across devices, achieving up to a $5.6\%$ gain in $mAP@50$ while using far fewer parameters, and running at $16$–$55$ FPS on an iPhone 12. Through extensive experiments on VisDrone and ablation studies, EDNet demonstrates consistent superiority over state-of-the-art YOLO and transformer-based models, with strong edge-CPU applicability and hardware-accelerated deployment via CoreML, INT8 quantization, and FP16 runtime. The work provides a practical, scalable solution for UAV-based small-target detection with clear guidance for deployment on mobile and embedded platforms.

Abstract

Detecting small targets in drone imagery is challenging due to low resolution, complex backgrounds, and dynamic scenes. We propose EDNet, a novel edge-target detection framework built on an enhanced YOLOv10 architecture, optimized for real-time applications without post-processing. EDNet incorporates an XSmall detection head and a Cross Concat strategy to improve feature fusion and multi-scale context awareness for detecting tiny targets in diverse environments. Our unique C2f-FCA block employs Faster Context Attention to enhance feature extraction while reducing computational complexity. The WIoU loss function is employed for improved bounding box regression. With seven model sizes ranging from Tiny to XL, EDNet accommodates various deployment environments, enabling local real-time inference and ensuring data privacy. Notably, EDNet achieves up to a 5.6% gain in mAP@50 with significantly fewer parameters. On an iPhone 12, EDNet variants operate at speeds ranging from 16 to 55 FPS, providing a scalable and efficient solution for edge-based object detection in challenging drone imagery. The source code and pre-trained models are available at: https://github.com/zsniko/EDNet.

EDNet: Edge-Optimized Small Target Detection in UAV Imagery -- Faster Context Attention, Better Feature Fusion, and Hardware Acceleration

TL;DR

EDNet addresses the challenge of small-target detection in UAV imagery by delivering an edge-optimized YOLOv10-based framework with architectural innovations—C2f-FCA backbone, XSmall detection head, and Cross Concat Strategy—plus the WIoUv3 loss for robust bounding-box regression. Seven scalable variants enable real-time edge deployment across devices, achieving up to a gain in while using far fewer parameters, and running at FPS on an iPhone 12. Through extensive experiments on VisDrone and ablation studies, EDNet demonstrates consistent superiority over state-of-the-art YOLO and transformer-based models, with strong edge-CPU applicability and hardware-accelerated deployment via CoreML, INT8 quantization, and FP16 runtime. The work provides a practical, scalable solution for UAV-based small-target detection with clear guidance for deployment on mobile and embedded platforms.

Abstract

Detecting small targets in drone imagery is challenging due to low resolution, complex backgrounds, and dynamic scenes. We propose EDNet, a novel edge-target detection framework built on an enhanced YOLOv10 architecture, optimized for real-time applications without post-processing. EDNet incorporates an XSmall detection head and a Cross Concat strategy to improve feature fusion and multi-scale context awareness for detecting tiny targets in diverse environments. Our unique C2f-FCA block employs Faster Context Attention to enhance feature extraction while reducing computational complexity. The WIoU loss function is employed for improved bounding box regression. With seven model sizes ranging from Tiny to XL, EDNet accommodates various deployment environments, enabling local real-time inference and ensuring data privacy. Notably, EDNet achieves up to a 5.6% gain in mAP@50 with significantly fewer parameters. On an iPhone 12, EDNet variants operate at speeds ranging from 16 to 55 FPS, providing a scalable and efficient solution for edge-based object detection in challenging drone imagery. The source code and pre-trained models are available at: https://github.com/zsniko/EDNet.
Paper Structure (17 sections, 11 equations, 8 figures, 3 tables)

This paper contains 17 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison with state-of-the-art (SOTA) models for object detection. Size-mAP (left) and latency-mAP (right).
  • Figure 2: The proposed EDNet framework. The main architecture (backbone-neck-head) is illustrated in the center with a more detailed illustration of each block in the surroundings. ConvBNSiLU: Conv2d + Batch Normalization + SiLU.
  • Figure 3: The proposed C2f-FCA block with Faster Context Attention bottleneck.
  • Figure 4: Sample predictions with the 1.78M EDNet-Tiny model under various scenarios.
  • Figure 5: Performance comparison between EDNet and YOLOv10: (a) mAP gain relative to models of equal or larger size; (b) Parameter reduction compared to larger YOLOv10 models while achieving higher mAP.
  • ...and 3 more figures