One Thousand and One Hours: Self-driving Motion Prediction Dataset
John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, Peter Ondruska
TL;DR
The paper addresses the need for large, open datasets to advance motion forecasting and planning in self-driving systems. It releases a comprehensive dataset comprising 1,118 hours across 170,000 scenes along a fixed route, with a rich HD semantic map and aerial imagery, plus the L5Kit toolkit and baseline benchmarks for forecasting and planning. The study demonstrates that increasing data scale yields tangible improvements in both forecasting accuracy and planning performance, underscoring data as a key driver for ML-based SDV capabilities. By providing open access to a high-detail, route-focused dataset, the work aims to democratize development and accelerate progress toward robust, scalable autonomous driving solutions.
Abstract
Motivated by the impact of large-scale datasets on ML systems we present the largest self-driving dataset for motion prediction to date, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time. On top of this, the dataset contains a high-definition semantic map with 15,242 labelled elements and a high-definition aerial view over the area. We show that using a dataset of this size dramatically improves performance for key self-driving problems. Combined with the provided software kit, this collection forms the largest and most detailed dataset to date for the development of self-driving machine learning tasks, such as motion forecasting, motion planning and simulation. The full dataset is available at http://level5.lyft.com/.
