Table of Contents
Fetching ...

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov

TL;DR

The paper introduces the Waymo Open Dataset, a large-scale, multimodal autonomous driving corpus with synchronized LiDAR and camera data collected across multiple cities to support robust perception benchmarks and domain-generalization research. It provides richly labeled 3D and 2D ground truth with track IDs, rolling-shutter aware projections, and extensive baselines for 2D/3D detection and tracking, highlighting domain gaps across geographies. Through detailed experiments on object detection and tracking, and analyses of domain shift and dataset size, the work demonstrates the dataset’s value for advancing perception in diverse real-world conditions. The dataset and code are publicly released, with plans for ongoing benchmarks and future expansions to support broader driving tasks such as behavior prediction and planning.

Abstract

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

TL;DR

The paper introduces the Waymo Open Dataset, a large-scale, multimodal autonomous driving corpus with synchronized LiDAR and camera data collected across multiple cities to support robust perception benchmarks and domain-generalization research. It provides richly labeled 3D and 2D ground truth with track IDs, rolling-shutter aware projections, and extensive baselines for 2D/3D detection and tracking, highlighting domain gaps across geographies. Through detailed experiments on object detection and tracking, and analyses of domain shift and dataset size, the work demonstrates the dataset’s value for advancing perception in diverse real-world conditions. The dataset and code are publicly released, with plans for ongoing benchmarks and future expansions to support broader driving tasks such as behavior prediction and planning.

Abstract

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.

Paper Structure

This paper contains 23 sections, 4 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Sensor layout and coordinate systems.
  • Figure 2: LiDAR label example. Yellow = vehicle. Red = pedestrian. Blue = sign. Pink = cyclist.
  • Figure 3: Camera LiDAR synchronization accuracy in milliseconds. The number in x-axis is in milli-seconds. The y-axis denotes the percentage of data frames.
  • Figure 4: A range image example. It is cropped to only show the front 90°. The first three rows are range, intensity, and elongation from the first LiDAR return. The last three are range, intensity, and elongation from the second LiDAR return.
  • Figure 5: An example image overlaid with LiDAR point projections.
  • ...and 1 more figures