Table of Contents
Fetching ...

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving

Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge, Junsong Chen, Zhenguo Li, Ping Luo

TL;DR

This paper introduces DeepAccident, the first large-scale V2X autonomous driving dataset that embeds safety-critical accident scenarios generated in a realistic simulator, enabling end-to-end motion and accident prediction alongside perception tasks.It provides four vehicles plus infrastructure with multi-view cameras and LiDAR, totaling 285k annotated samples and 57k V2X frames, and defines a new Accident Prediction Accuracy (APA) metric to quantify predictive safety performance.A novel V2X model, V2XFormer, leveraging Swin Transformer-based BEV features and advanced fusion (CoBEVT), demonstrates superior motion, detection, and accident-prediction performance compared with single-vehicle baselines.The dataset supports robust evaluation under varying accident visibility, prediction horizons, and communication delays, and demonstrates beneficial sim-to-real transfer when finetuned on nuScenes.

Abstract

Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving

TL;DR

This paper introduces DeepAccident, the first large-scale V2X autonomous driving dataset that embeds safety-critical accident scenarios generated in a realistic simulator, enabling end-to-end motion and accident prediction alongside perception tasks.It provides four vehicles plus infrastructure with multi-view cameras and LiDAR, totaling 285k annotated samples and 57k V2X frames, and defines a new Accident Prediction Accuracy (APA) metric to quantify predictive safety performance.A novel V2X model, V2XFormer, leveraging Swin Transformer-based BEV features and advanced fusion (CoBEVT), demonstrates superior motion, detection, and accident-prediction performance compared with single-vehicle baselines.The dataset supports robust evaluation under varying accident visibility, prediction horizons, and communication delays, and demonstrates beneficial sim-to-real transfer when finetuned on nuScenes.

Abstract

Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.
Paper Structure (15 sections, 1 equation, 15 figures, 8 tables)

This paper contains 15 sections, 1 equation, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Illustration of our proposed end-to-end motion and accident prediction task. Given the history camera observations, the single vehicle model (vehicle # 1) fails to predict any motion or accident on the forward right side due to occlusion from buildings. In contrast, the V2X model communicates with other vehicles and infrastructure, thereby successfully anticipating the upcoming accident. The red and green bounding boxes in the images, respectively, represent the colliding vehicles and the other V2X vehicles behind them.
  • Figure 2: Designed accident scenarios in DeepAccident across signalized intersections and unsignalized intersections. Each scenario involves two colliding vehicles with overlapping trajectories and two following vehicles. The designed scenarios include: (1) running against a red light at four-way intersections, (2) left turn against a red light at four-way intersections, (3) unprotected left turn at four-way intersections, (4) right turn against left turn at four-way intersections, (5) right turn against left turn at three-way intersections (6) go straight against right turn at three-way intersections in signalized cases. In unsignalized cases, the designed overlapping trajectories are the same, but there are no traffic lights to affect the vehicle behaviors.
  • Figure 3: Distribution of the proposed DeepAccident dataset
  • Figure 4: Network details of the proposed V2XFormer. We use the three-V2X-agent setting consisting of ego AV, AV, and Infra for illustration. V2X agents in V2XFormer utilize a shared-weight BEV extractor to extract BEV features based on multi-view camera observation history within the previous N frames.
  • Figure 5: Performance comparison between the single-vehicle model and different v2x configuration models v.s. accident visibility.
  • ...and 10 more figures