Table of Contents
Fetching ...

YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion

Hanqing Guo, Xiuxiu Lin, Shiyu Zhao

TL;DR

This work tackles the challenge of detecting extremely small drones in complex scenes with substantial ego-motion. It introduces YOLOMG, a motion-guided detector that fuses a pixel-level motion difference map with RGB appearance through a bimodal adaptive fusion module, powered by a lightweight YOLOv5-based backbone. The authors validate their approach on the ARD100 dataset and the NPS-Drones dataset, demonstrating superior AP and robust generalization, including under low-light conditions. The study provides practical implications for real-time, reliable drone detection in aerial applications and contributes a new, challenging benchmark for future research.

Abstract

Vision-based drone-to-drone detection has attracted increasing attention due to its importance in numerous tasks such as vision-based swarming, aerial see-and-avoid, and malicious drone detection. However, existing methods often encounter failures when the background is complex or the target is tiny. This paper proposes a novel end-to-end framework that accurately identifies small drones in complex environments using motion guidance. It starts by creating a motion difference map to capture the motion characteristics of tiny drones. Next, this motion difference map is combined with an RGB image using a bimodal fusion module, allowing for adaptive feature learning of the drone. Finally, the fused feature map is processed through an enhanced backbone and detection head based on the YOLOv5 framework to achieve accurate detection results. To validate our method, we propose a new dataset, named ARD100, which comprises 100 videos (202,467 frames) covering various challenging conditions and has the smallest average object size compared with the existing drone detection datasets. Extensive experiments on the ARD100 and NPS-Drones datasets show that our proposed detector performs exceptionally well under challenging conditions and surpasses state-of-the-art algorithms across various metrics. We publicly release the codes and ARD100 dataset at https://github.com/Irisky123/YOLOMG.

YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion

TL;DR

This work tackles the challenge of detecting extremely small drones in complex scenes with substantial ego-motion. It introduces YOLOMG, a motion-guided detector that fuses a pixel-level motion difference map with RGB appearance through a bimodal adaptive fusion module, powered by a lightweight YOLOv5-based backbone. The authors validate their approach on the ARD100 dataset and the NPS-Drones dataset, demonstrating superior AP and robust generalization, including under low-light conditions. The study provides practical implications for real-time, reliable drone detection in aerial applications and contributes a new, challenging benchmark for future research.

Abstract

Vision-based drone-to-drone detection has attracted increasing attention due to its importance in numerous tasks such as vision-based swarming, aerial see-and-avoid, and malicious drone detection. However, existing methods often encounter failures when the background is complex or the target is tiny. This paper proposes a novel end-to-end framework that accurately identifies small drones in complex environments using motion guidance. It starts by creating a motion difference map to capture the motion characteristics of tiny drones. Next, this motion difference map is combined with an RGB image using a bimodal fusion module, allowing for adaptive feature learning of the drone. Finally, the fused feature map is processed through an enhanced backbone and detection head based on the YOLOv5 framework to achieve accurate detection results. To validate our method, we propose a new dataset, named ARD100, which comprises 100 videos (202,467 frames) covering various challenging conditions and has the smallest average object size compared with the existing drone detection datasets. Extensive experiments on the ARD100 and NPS-Drones datasets show that our proposed detector performs exceptionally well under challenging conditions and surpasses state-of-the-art algorithms across various metrics. We publicly release the codes and ARD100 dataset at https://github.com/Irisky123/YOLOMG.

Paper Structure

This paper contains 21 sections, 1 equation, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of various challenges in drone-to-drone detection. (a) shows that the drone is confused with the background, (b) indicates that the objects in our ARD100 dataset are extremely small, (c) shows the motion blur caused by the camera's ego-motion. (right images are better view with 700$\%$ zoom in).
  • Figure 2: The overall architecture of our proposed YOLOMG algorithm. First, a motion feature enhancement module extracts the motion difference map of drones. Next, a bimodal fusion module adaptively combines the RGB and motion features. Then, the fused feature map is passed to the lightweight YOLO backbone for deep features extraction and processed through the Feature Pyramid Network (FPN) for cross-layer fusion. Finally, the feature maps are fed to the detection head to produce the detection results.
  • Figure 3: An example of the motion difference map. The bottom left image is the cropped RGB image, the bottom right image is the cropped motion difference map (bottom images are better view with 300$\%$ zoom in). The yellow boxes enclose the target drone. The blue circle enclose the interruptions.
  • Figure 4: The structure of our proposed YOLOMG network.
  • Figure 5: Some representative images in the ARD100 dataset. The first row is the complex backgrounds, the second row is the tiny objects (usually smaller than 12$\times$12 pixels), the third row demonstrates the severe motion blur caused by abrupt camera movement, and the fourth row shows the drones under low-light conditions.
  • ...and 3 more figures