WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Kan Chen; Runzhou Ge; Hang Qiu; Rami AI-Rfou; Charles R. Qi; Xuanyu Zhou; Zoey Yang; Scott Ettinger; Pei Sun; Zhaoqi Leng; Mustafa Baniodeh; Ivan Bogun; Weiyue Wang; Mingxing Tan; Dragomir Anguelov

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Kan Chen, Runzhou Ge, Hang Qiu, Rami AI-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov

TL;DR

WOMD-LiDAR addresses the dependence of motion forecasting on lossy, perception-derived inputs by releasing a large-scale raw LiDAR dataset aligned with the WOMD. It introduces a two-stage baseline that uses SWFormer-derived LiDAR embeddings as an extra modality for WayFormer, demonstrating measurable improvements in trajectory prediction metrics. The dataset comprises over 100k scenes with 20-second durations and employs an 8× compression approach to reach about 2.3 TB, enabling practical distribution and use. Overall, the work shows that raw LiDAR data can boost end-to-end motion forecasting performance and opens avenues for end-to-end sensing-to-prediction methods and richer scene understanding.

Abstract

Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the human-designed explicit interfaces between perception and motion forecasting typically pass only a subset of the semantic information present in the original sensory input. To study the effect of these modular approaches, design new paradigms that mitigate these limitations, and accelerate the development of end-to-end motion forecasting models, we augment the Waymo Open Motion Dataset (WOMD) with large-scale, high-quality, diverse LiDAR data for the motion forecasting task. The new augmented dataset WOMD-LiDAR consists of over 100,000 scenes that each spans 20 seconds, consisting of well-synchronized and calibrated high quality LiDAR point clouds captured across a range of urban and suburban geographies (https://waymo.com/open/data/motion/). Compared to Waymo Open Dataset (WOD), WOMD-LiDAR dataset contains 100x more scenes. Furthermore, we integrate the LiDAR data into the motion forecasting model training and provide a strong baseline. Experiments show that the LiDAR data brings improvement in the motion forecasting task. We hope that WOMD-LiDAR will provide new opportunities for boosting end-to-end motion forecasting models.

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 6 figures, 7 tables)

This paper contains 19 sections, 2 equations, 6 figures, 7 tables.

Introduction
Related Work
Dataset
Dataset Statistics
LiDAR Data Format
LiDAR Data Compression
Motion Forecasting Model with LiDAR
Motion Forecasting Model
LiDAR Encoding Scheme
Experiments
Experiment Setup
Metrics
Baseline Model Performance
Ablation Study
Qualitative Results
...and 4 more sections

Figures (6)

Figure 1: Human-interpretable labels from the perception system provide limited information at the scene level and the object level. In sophisticated scenes with interaction between multiple objects, raw sensor data provides rich information and helps improve the motion forecasting performance. Legends in the figure: Yellow and blue (highlighted) trajectories are predictions for different agents. Red dotted lines are agents' ground truth trajectories.
Figure 2: Visualization of a range image from the top LiDAR sensor in WOMD-LiDAR. The three rows are showing range, (normalized) intensity, and (normalized) elongation from the first LiDAR return (second return omitted due to brevity). We crop the range images to only show the front 180$^{\circ}$.
Figure 3: Model structures of LiDAR encoder (left) and motion forecasting model (right). To encode LiDAR data, we adopt a pre-trained SWFormer sun2022swformer model and extract the embedding features (which can be decoded to produce detection results). Those features (in the light yellow box) from different scales are concatenated and fed to a WayFormer nayakanti2022wayformer model as a new modality feature for the motion forecasting task.
Figure 4: Visualization of prediction result comparison between WayFormer nayakanti2022wayformer (sub-figures on the left) and WayFormer with LiDAR inputs (sub-figures on the right). Fig (a): With LiDAR information the predicted trajectories avoid crashing into parked cars. Fig (b): The predicted trajectories of cyclists avoid crashing into cars. Legends in the figure: Yellow and blue trajectories are predictions for different agents, while blue trajectories are highlighted ones. Red dotted lines are labeled ground truth trajectories for agents in the scene.
Figure 5: Scenario visualizations with LiDAR. Better viewed in color and zoom in for more details.
...and 1 more figures

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

TL;DR

Abstract

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (6)