Table of Contents
Fetching ...

MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset

Zaid A. El Shair, Samir A. Rawashdeh

TL;DR

The paper introduces MEVDT, a multi-modal event-based vehicle detection and tracking dataset that couples DAVIS 240c event streams with grayscale APS frames. It provides 63 sequences with roughly 13k images, 5M events, 10k object labels, and 85 continuous object trajectories annotated at 24 Hz, across two scenes (A and B) with an 80/20 train/test split. MEVDT offers multiple label formats (COCO, MOT, and a custom per-frame format) and fixed-duration event samples (AEDAT 3.1) to facilitate both detection and tracking research in automotive contexts. By enabling precise cross-modal alignment and high temporal resolution labels, MEVDT supports development and evaluation of event-based and multi-modal perception methods under dynamic traffic conditions.

Abstract

In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels $\unicode{x2014}$ consisting of object classifications, pixel-precise bounding boxes, and unique object IDs $\unicode{x2014}$ which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.

MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset

TL;DR

The paper introduces MEVDT, a multi-modal event-based vehicle detection and tracking dataset that couples DAVIS 240c event streams with grayscale APS frames. It provides 63 sequences with roughly 13k images, 5M events, 10k object labels, and 85 continuous object trajectories annotated at 24 Hz, across two scenes (A and B) with an 80/20 train/test split. MEVDT offers multiple label formats (COCO, MOT, and a custom per-frame format) and fixed-duration event samples (AEDAT 3.1) to facilitate both detection and tracking research in automotive contexts. By enabling precise cross-modal alignment and high temporal resolution labels, MEVDT supports development and evaluation of event-based and multi-modal perception methods under dynamic traffic conditions.

Abstract

In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels consisting of object classifications, pixel-precise bounding boxes, and unique object IDs which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.
Paper Structure (12 sections, 4 equations, 5 figures, 5 tables)

This paper contains 12 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: MEVDT dataset directory structure and organization.
  • Figure 2: Satellite view of a subsection of the University of Michigan-Dearborn campus highlighting Scene A and Scene B, where data was collected, along with the position of the LiDAR sensor.
  • Figure 3: Sample image outputs from the dataset demonstrating the two distinct scenes, including (a) Scene A and (b) Scene B, showcasing the camera's perspective and field of view for each location within the University of Michigan-Dearborn's campus.
  • Figure 4: The data collection setup showing the hybrid event camera (DAVIS 240) mounted on a tripod at the edge of a building overlooking the street and part of the parking lot. A laptop adjacent to the camera setup is used for data recording and sensor control.
  • Figure 5: Samples from the dataset showing labeled vehicles. Each image demonstrates the APS intensity frame with superimposed events from the DVS collected in the last $\sim$43 ms, where blue and red pixels visualize positive and negative events, respectively. The samples include various vehicle types such as (a) SUVs, (b) trucks, (c) vans, and (d) pickup trucks captured in two different scenes (Scene A for the top row and Scene B for the bottom row). The presence of multiple objects and vehicles at different speeds (e--f) illustrates the dataset's utility for object detection and tracking research.