SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang
TL;DR
SEVD addresses the scarcity of synthetic, multi-view event-based driving data by leveraging the CARLA simulator to produce synchronized $\langle x, y, p, t \rangle$ event streams from six ego DVS cameras and four fixed DVS sensors, along with RGB, depth, optical flow, semantic, and instance data. The dataset provides extensive annotations (2D/3D bounding boxes in COCO, Pascal VOC, KITTI) across diverse lighting, weather, and scene types, totaling $27\,\text{h}$ fixed and $31\,\text{h}$ ego event data (plus other sensor data) and over $9\text{M}$ bounding boxes. Baselines with state-of-the-art event-based detectors (RVT, RED) and a frame-based detector (YOLOv8) establish 2D detection benchmarks and reveal synthetic-to-real generalization potential, including transfer to real Prophesee data. SEVD thus enables robust evaluation and development of multi-view, high-temporal-resolution perception for autonomous driving and V2I applications, supporting research on occlusion handling, domain shifts, and cooperative perception.
Abstract
Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator. Data sequences are recorded across diverse lighting (noon, nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy) with domain shifts (discrete and continuous). SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects (car, truck, van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene. Furthermore, we evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks and provide baseline benchmarks for assessment. Additionally, we conduct experiments to assess the synthetic event-based dataset's generalization capabilities. The dataset is available at https://eventbasedvision.github.io/SEVD
