Table of Contents
Fetching ...

Neuromorphic Drone Detection: an Event-RGB Multimodal Approach

Gabriele Magrini, Federico Becattini, Pietro Pala, Alberto Del Bimbo, Antonio Porta

TL;DR

A novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds is presented, and a novel spatio-temporally synchronized Event-RGB Drone detection dataset is released.

Abstract

In recent years, drone detection has quickly become a subject of extreme interest: the potential for fast-moving objects of contained dimensions to be used for malicious intents or even terrorist attacks has posed attention to the necessity for precise and resilient systems for detecting and identifying such elements. While extensive literature and works exist on object detection based on RGB data, it is also critical to recognize the limits of such modality when applied to UAVs detection. Detecting drones indeed poses several challenges such as fast-moving objects and scenes with a high dynamic range or, even worse, scarce illumination levels. Neuromorphic cameras, on the other hand, can retain precise and rich spatio-temporal information in situations that are challenging for RGB cameras. They are resilient to both high-speed moving objects and scarce illumination settings, while prone to suffer a rapid loss of information when the objects in the scene are static. In this context, we present a novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds. To this end, we also release NeRDD (Neuromorphic-RGB Drone Detection), a novel spatio-temporally synchronized Event-RGB Drone detection dataset of more than 3.5 hours of multimodal annotated recordings.

Neuromorphic Drone Detection: an Event-RGB Multimodal Approach

TL;DR

A novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds is presented, and a novel spatio-temporally synchronized Event-RGB Drone detection dataset is released.

Abstract

In recent years, drone detection has quickly become a subject of extreme interest: the potential for fast-moving objects of contained dimensions to be used for malicious intents or even terrorist attacks has posed attention to the necessity for precise and resilient systems for detecting and identifying such elements. While extensive literature and works exist on object detection based on RGB data, it is also critical to recognize the limits of such modality when applied to UAVs detection. Detecting drones indeed poses several challenges such as fast-moving objects and scenes with a high dynamic range or, even worse, scarce illumination levels. Neuromorphic cameras, on the other hand, can retain precise and rich spatio-temporal information in situations that are challenging for RGB cameras. They are resilient to both high-speed moving objects and scarce illumination settings, while prone to suffer a rapid loss of information when the objects in the scene are static. In this context, we present a novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds. To this end, we also release NeRDD (Neuromorphic-RGB Drone Detection), a novel spatio-temporally synchronized Event-RGB Drone detection dataset of more than 3.5 hours of multimodal annotated recordings.
Paper Structure (19 sections, 5 figures, 5 tables)

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Pooling-based fusion approach. We pool the features after a cut-off layer (the encoder) and process the blended features with the final part of the model.
  • Figure 2: Asymmetric modality injection. The main domain (event) is informed about the complementary domain (RGB) thanks to a cross-attention mechanism that blends the features asymmetrically.
  • Figure 3: Symmetric fusion architecture. Two asymmetric injections are performed, to inform the two modalities of each other. A final pooling layer is used to merge the features symmetrically.
  • Figure 4: Example of aligned frames in the dataset. The left figures are the RGB images, while the right images are the event frames obtained by accumulating all the events in a time slice of 33ms. The red boxes in the images are the ground truth bounding boxes for the detection, with the same identical coordinates for both domains. Crops of the original images are displayed for better visualization.
  • Figure 5: Results of the main methods in various scenarios. The ground truth bounding box is shown in red, while the other colored boxes are the outputs of the model. EV-to-RGB and RGB-to-EV represent the results of the asymmetric modality injection fusion strategy. When only the red ground truth box is displayed, the model fails to detect the drone. Crops of the original images are displayed for better visualization.