Table of Contents
Fetching ...

PAVE: An End-to-End Dataset for Production Autonomous Vehicle Evaluation

Xiangyu Li, Chen Wang, Yumao Liu, Dengbo He, Jiahao Zhang, Ke Ma

TL;DR

The paper introduces PAVE, the Production Autonomous Vehicle Evaluation dataset, the first large-scale end-to-end benchmark captured entirely in autonomous-driving mode and spanning diverse environments. It provides synchronized multi-camera perception data and centimeter-level GNSS/IMU trajectories, with explicit driving-mode labels and driver intent to enable end-to-end evaluation of motion planning and safety. Key contributions include a comprehensive 4-camera sensing suite, precise coordinate transformations, privacy-preserving data handling, rich frame-level and scenario annotations, a MySQL-based database schema, and baseline experiments for object detection and end-to-end planning, achieving an overall ADE of $1.47$ m and FDE of $8.16$ m. The dataset supports analysis of AV behavior and safety in real-world production systems and is designed to grow by more than 10 hours per week, facilitating ongoing research and benchmarking in perception, planning, and safety evaluation.

Abstract

Most existing autonomous-driving datasets (e.g., KITTI, nuScenes, and the Waymo Perception Dataset), collected by human-driving mode or unidentified driving mode, can only serve as early training for the perception and prediction of autonomous vehicles (AVs). To evaluate the real behavioral safety of AVs controlled in the black box, we present the first end-to-end benchmark dataset collected entirely by autonomous-driving mode in the real world. This dataset contains over 100 hours of naturalistic data from multiple production autonomous-driving vehicle models in the market. We segment the original data into 32,727 key frames, each consisting of four synchronized camera images and high-precision GNSS/IMU data (0.8 cm localization accuracy). For each key frame, 20 Hz vehicle trajectories spanning the past 6 s and future 5 s are provided, along with detailed 2D annotations of surrounding vehicles, pedestrians, traffic lights, and traffic signs. These key frames have rich scenario-level attributes, including driver intent, area type (covering highways, urban roads, and residential areas), lighting (day, night, or dusk), weather (clear or rain), road surface (paved or unpaved), traffic and vulnerable road users (VRU) density, traffic lights, and traffic signs (warning, prohibition, and indication). To evaluate the safety of AVs, we employ an end-to-end motion planning model that predicts vehicle trajectories with an Average Displacement Error (ADE) of 1.4 m on autonomous-driving frames. The dataset continues to expand by over 10 hours of new data weekly, thereby providing a sustainable foundation for research on AV driving behavior analysis and safety evaluation. The PAVE dataset is publicly available at https://hkustgz-my.sharepoint.com/:f:/g/personal/kema_hkust-gz_edu_cn/IgDXyoHKfdGnSZ3JbbidjduMAXxs-Z3NXzm005A_Ix9tr0Q?e=9HReCu.

PAVE: An End-to-End Dataset for Production Autonomous Vehicle Evaluation

TL;DR

The paper introduces PAVE, the Production Autonomous Vehicle Evaluation dataset, the first large-scale end-to-end benchmark captured entirely in autonomous-driving mode and spanning diverse environments. It provides synchronized multi-camera perception data and centimeter-level GNSS/IMU trajectories, with explicit driving-mode labels and driver intent to enable end-to-end evaluation of motion planning and safety. Key contributions include a comprehensive 4-camera sensing suite, precise coordinate transformations, privacy-preserving data handling, rich frame-level and scenario annotations, a MySQL-based database schema, and baseline experiments for object detection and end-to-end planning, achieving an overall ADE of m and FDE of m. The dataset supports analysis of AV behavior and safety in real-world production systems and is designed to grow by more than 10 hours per week, facilitating ongoing research and benchmarking in perception, planning, and safety evaluation.

Abstract

Most existing autonomous-driving datasets (e.g., KITTI, nuScenes, and the Waymo Perception Dataset), collected by human-driving mode or unidentified driving mode, can only serve as early training for the perception and prediction of autonomous vehicles (AVs). To evaluate the real behavioral safety of AVs controlled in the black box, we present the first end-to-end benchmark dataset collected entirely by autonomous-driving mode in the real world. This dataset contains over 100 hours of naturalistic data from multiple production autonomous-driving vehicle models in the market. We segment the original data into 32,727 key frames, each consisting of four synchronized camera images and high-precision GNSS/IMU data (0.8 cm localization accuracy). For each key frame, 20 Hz vehicle trajectories spanning the past 6 s and future 5 s are provided, along with detailed 2D annotations of surrounding vehicles, pedestrians, traffic lights, and traffic signs. These key frames have rich scenario-level attributes, including driver intent, area type (covering highways, urban roads, and residential areas), lighting (day, night, or dusk), weather (clear or rain), road surface (paved or unpaved), traffic and vulnerable road users (VRU) density, traffic lights, and traffic signs (warning, prohibition, and indication). To evaluate the safety of AVs, we employ an end-to-end motion planning model that predicts vehicle trajectories with an Average Displacement Error (ADE) of 1.4 m on autonomous-driving frames. The dataset continues to expand by over 10 hours of new data weekly, thereby providing a sustainable foundation for research on AV driving behavior analysis and safety evaluation. The PAVE dataset is publicly available at https://hkustgz-my.sharepoint.com/:f:/g/personal/kema_hkust-gz_edu_cn/IgDXyoHKfdGnSZ3JbbidjduMAXxs-Z3NXzm005A_Ix9tr0Q?e=9HReCu.

Paper Structure

This paper contains 33 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Sensor setup for our data collection platform.
  • Figure 2: Representative examples of our scenario annotations, covering key conditions such as area type (highway, urban, residential), lighting (day, night), weather (clear, rain), and traffic density (high, low). Each sample is shown with 2D detection boxes to illustrate the annotated traffic participants and objects captured.
  • Figure 3: Relational schema of the PAVE dataset database, showing core tables and relationships.
  • Figure 4: Distribution of scenario types across annotated frames. Vehicle and VRU densities are categorized as low (1--5 objects), medium (6--15 objects), and high ($\geq$16 objects) based on per-frame counts of relevant detections. Traffic sign categories follow the GB 5768 standard GB5768.2-2022: warning (yellow triangles), prohibition (red circles), and indication (blue circles).
  • Figure 5: Distributions of ADE and FDE errors across different traffic scenarios. Each subplot corresponds to one scenario dimension (area type, lighting, weather, road surface, vehicle/VRU density, traffic lights, and traffic sign categories), with solid lines indicating ADE and dashed lines indicating FDE.