Table of Contents
Fetching ...

NU-AIR -- A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles

Craig Iaboni, Thomas Kelly, Pramod Abichandani

TL;DR

NU-AIR delivers an open-source neuromorphic aerial dataset for urban pedestrian and vehicle detection, captured with a drone-mounted event camera across daylight and night conditions and organized into 283 15-second clips with 93,204 bounding-box annotations. The authors evaluate ten frame-based DNNs, three SNNs with voxel-cube encoding, and a Recurrent Vision Transformer, reporting COCO-style mAP and latency metrics, and provide open-source code for voxelization and model training. They find frame-based DNNs generally outperform SNNs on NU-AIR, while ablation studies reveal how depth, bias, pooling, and normalization affect SNN performance, and they demonstrate a fast RVT baseline with strong latency characteristics. Limitations include evaluations on GPUs rather than edge neuromorphic hardware, data from a single city, and drone-induced artifacts, with future work aimed at multi-city data, multi-modal sensing, and segmentation tasks. Overall, NU-AIR offers a valuable benchmark for neuromorphic urban perception and informs practical deployment considerations for aerial, event-based vision systems.

Abstract

This paper presents an open-source aerial neuromorphic dataset that captures pedestrians and vehicles moving in an urban environment. The dataset, titled NU-AIR, features 70.75 minutes of event footage acquired with a 640 x 480 resolution neuromorphic sensor mounted on a quadrotor operating in an urban environment. Crowds of pedestrians, different types of vehicles, and street scenes featuring busy urban environments are captured at different elevations and illumination conditions. Manual bounding box annotations of vehicles and pedestrians contained in the recordings are provided at a frequency of 30 Hz, yielding 93,204 labels in total. Evaluation of the dataset's fidelity is performed through comprehensive ablation study for three Spiking Neural Networks (SNNs) and training ten Deep Neural Networks (DNNs) to validate the quality and reliability of both the dataset and corresponding annotations. All data and Python code to voxelize the data and subsequently train SNNs/DNNs has been open-sourced.

NU-AIR -- A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles

TL;DR

NU-AIR delivers an open-source neuromorphic aerial dataset for urban pedestrian and vehicle detection, captured with a drone-mounted event camera across daylight and night conditions and organized into 283 15-second clips with 93,204 bounding-box annotations. The authors evaluate ten frame-based DNNs, three SNNs with voxel-cube encoding, and a Recurrent Vision Transformer, reporting COCO-style mAP and latency metrics, and provide open-source code for voxelization and model training. They find frame-based DNNs generally outperform SNNs on NU-AIR, while ablation studies reveal how depth, bias, pooling, and normalization affect SNN performance, and they demonstrate a fast RVT baseline with strong latency characteristics. Limitations include evaluations on GPUs rather than edge neuromorphic hardware, data from a single city, and drone-induced artifacts, with future work aimed at multi-city data, multi-modal sensing, and segmentation tasks. Overall, NU-AIR offers a valuable benchmark for neuromorphic urban perception and informs practical deployment considerations for aerial, event-based vision systems.

Abstract

This paper presents an open-source aerial neuromorphic dataset that captures pedestrians and vehicles moving in an urban environment. The dataset, titled NU-AIR, features 70.75 minutes of event footage acquired with a 640 x 480 resolution neuromorphic sensor mounted on a quadrotor operating in an urban environment. Crowds of pedestrians, different types of vehicles, and street scenes featuring busy urban environments are captured at different elevations and illumination conditions. Manual bounding box annotations of vehicles and pedestrians contained in the recordings are provided at a frequency of 30 Hz, yielding 93,204 labels in total. Evaluation of the dataset's fidelity is performed through comprehensive ablation study for three Spiking Neural Networks (SNNs) and training ten Deep Neural Networks (DNNs) to validate the quality and reliability of both the dataset and corresponding annotations. All data and Python code to voxelize the data and subsequently train SNNs/DNNs has been open-sourced.
Paper Structure (28 sections, 8 figures, 2 tables)

This paper contains 28 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: (Left Top and Bottom) The DJI M-100 quadrotor was used to record pedestrians and vehicles in urban environments. A safety rope was affixed to the quadrotor during all flight operations. The recording quadrotor flew at varying heights above a city intersection (Top Mid and Right), a walking path (Bottom Mid), and a campus center (Bottom Right). Bounding boxes created by manual annotators were drawn over frames.
  • Figure 2: Spatial resolution of the camera as a function of distance, measured over a range of 3.05 to 18.29 meters.
  • Figure 3: The DJI Matrice M100 quadrotor with a forward-facing Prophesee Gen3.1 VGA event camera was used for data collection.
  • Figure 4: Sample images from the NU-AIR dataset showcasing the diversity of urban environments and the intricate challenges presented by drone-based neuromorphic imaging.
  • Figure 5: The ratio of width to height, known as the aspect ratio, is displayed for the manually labeled bounding boxes of pedestrians (top) and vehicles (bottom) in the dataset. The histograms for training, validation, and testing subsets of pedestrian and vehicle subjects are depicted.
  • ...and 3 more figures