Table of Contents
Fetching ...

EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition

Sudipto Sarkar, Mohammad Asif Hasan, Khondokar Ashik Shahriar, Fablia Labiba, Nahian Tasnim, Sheikh Anawarul Haq Fattah

TL;DR

This paper tackles drone-versus-bird discrimination under adverse conditions by introducing EGD-YOLOv8n, a lightweight RGB-IR fusion detector that integrates GhostConv, GhostBottleneck, EMA attention, and a deformable head (DDetect) into YOLOv8n. Trained and evaluated on the VIP Cup 2025 dataset, it analyzes RGB, IR, and fusion pipelines, achieving the best performance in the fusion setting with $Precision=0.901$, $mAP_{50}=0.885$, and $mAP_{50-95}=0.425$, while maintaining real-time operation on edge GPUs. The approach balances accuracy and efficiency with a compact $\sim$3.5M parameter footprint and $<30$ ms per frame, where RGB, IR, and fusion each reach substantial speed (e.g., $57.5$, $56.2$, and $54.8$ FPS respectively). The results demonstrate the practical viability of robust, multimodal drone surveillance under distortions, with fusion yielding the strongest gains and clear paths for future temporal tracking and quantization enhancements.

Abstract

Identifying drones and birds correctly is essential for keeping the skies safe and improving security systems. Using the VIP CUP 2025 dataset, which provides both RGB and infrared (IR) images, this study presents EGD-YOLOv8n, a new lightweight yet powerful model for object detection. The model improves how image features are captured and understood, making detection more accurate and efficient. It uses smart design changes and attention layers to focus on important details while reducing the amount of computation needed. A special detection head helps the model adapt to objects of different shapes and sizes. We trained three versions: one using RGB images, one using IR images, and one combining both. The combined model achieved the best accuracy and reliability while running fast enough for real-time use on common GPUs.

EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition

TL;DR

This paper tackles drone-versus-bird discrimination under adverse conditions by introducing EGD-YOLOv8n, a lightweight RGB-IR fusion detector that integrates GhostConv, GhostBottleneck, EMA attention, and a deformable head (DDetect) into YOLOv8n. Trained and evaluated on the VIP Cup 2025 dataset, it analyzes RGB, IR, and fusion pipelines, achieving the best performance in the fusion setting with , , and , while maintaining real-time operation on edge GPUs. The approach balances accuracy and efficiency with a compact 3.5M parameter footprint and ms per frame, where RGB, IR, and fusion each reach substantial speed (e.g., , , and FPS respectively). The results demonstrate the practical viability of robust, multimodal drone surveillance under distortions, with fusion yielding the strongest gains and clear paths for future temporal tracking and quantization enhancements.

Abstract

Identifying drones and birds correctly is essential for keeping the skies safe and improving security systems. Using the VIP CUP 2025 dataset, which provides both RGB and infrared (IR) images, this study presents EGD-YOLOv8n, a new lightweight yet powerful model for object detection. The model improves how image features are captured and understood, making detection more accurate and efficient. It uses smart design changes and attention layers to focus on important details while reducing the amount of computation needed. A special detection head helps the model adapt to objects of different shapes and sizes. We trained three versions: one using RGB images, one using IR images, and one combining both. The combined model achieved the best accuracy and reliability while running fast enough for real-time use on common GPUs.

Paper Structure

This paper contains 14 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Detection Data Distribution
  • Figure 2: Detection Pipeline of 3 Modalities
  • Figure 3: Data Distribution for training
  • Figure 4: EGD Model Architecture
  • Figure 5: RGB Modality(a,b), IR Modality(c), Fusion Modality(d,e,f)