MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset
Zaid A. El Shair, Samir A. Rawashdeh
TL;DR
The paper introduces MEVDT, a multi-modal event-based vehicle detection and tracking dataset that couples DAVIS 240c event streams with grayscale APS frames. It provides 63 sequences with roughly 13k images, 5M events, 10k object labels, and 85 continuous object trajectories annotated at 24 Hz, across two scenes (A and B) with an 80/20 train/test split. MEVDT offers multiple label formats (COCO, MOT, and a custom per-frame format) and fixed-duration event samples (AEDAT 3.1) to facilitate both detection and tracking research in automotive contexts. By enabling precise cross-modal alignment and high temporal resolution labels, MEVDT supports development and evaluation of event-based and multi-modal perception methods under dynamic traffic conditions.
Abstract
In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels $\unicode{x2014}$ consisting of object classifications, pixel-precise bounding boxes, and unique object IDs $\unicode{x2014}$ which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.
