Table of Contents
Fetching ...

Pre-training on Synthetic Driving Data for Trajectory Prediction

Yiheng Li, Seth Z. Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan

TL;DR

This work tackles data scarcity in trajectory forecasting for autonomous driving by generating synthetic data through HD-map augmentation and rule-based trajectory synthesis, followed by self-supervised pre-training on the synthetic data. The authors adapt Masked AutoEncoder–style pre-training to learn general scene representations, then fine-tune on real data, achieving significant improvements over baselines in $MR_6$, $minADE_6$, and $minFDE_6$ (e.g., $5.04\%$, $3.84\%$, and $8.30\%$ respectively). A key finding is that pre-training on synthetic data, especially via self-supervised strategies, yields larger gains than directly augmenting real data or supervised pre-training, reducing the demand for real driving data. The approach demonstrates a practical pipeline for data expansion and representation learning in trajectory prediction, with substantial implications for data efficiency and cross-domain generalization in autonomous driving.

Abstract

Accumulating substantial volumes of real-world driving data proves pivotal in the realm of trajectory forecasting for autonomous driving. Given the heavy reliance of current trajectory forecasting models on data-driven methodologies, we aim to tackle the challenge of learning general trajectory forecasting representations under limited data availability. We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. The solution is composed of two parts: firstly, we adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. Specifically, we apply vector transformations to reshape the maps, and then employ a rule-based model to generate trajectories on both original and augmented scenes; thus enlarging the driving data without collecting additional real ones. To foster the learning of general representations within this augmented dataset, we comprehensively explore the different pre-training strategies, including extending the concept of a Masked AutoEncoder (MAE) for trajectory forecasting. Without bells and whistles, our proposed pipeline-level solution is general, simple, yet effective: we conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies, which outperform the baseline prediction model by large margins, e.g. 5.04%, 3.84% and 8.30% in terms of $MR_6$, $minADE_6$ and $minFDE_6$. The pre-training dataset and the codes for pre-training and fine-tuning are released at https://github.com/yhli123/Pretraining_on_Synthetic_Driving_Data_for_Trajectory_Prediction.

Pre-training on Synthetic Driving Data for Trajectory Prediction

TL;DR

This work tackles data scarcity in trajectory forecasting for autonomous driving by generating synthetic data through HD-map augmentation and rule-based trajectory synthesis, followed by self-supervised pre-training on the synthetic data. The authors adapt Masked AutoEncoder–style pre-training to learn general scene representations, then fine-tune on real data, achieving significant improvements over baselines in , , and (e.g., , , and respectively). A key finding is that pre-training on synthetic data, especially via self-supervised strategies, yields larger gains than directly augmenting real data or supervised pre-training, reducing the demand for real driving data. The approach demonstrates a practical pipeline for data expansion and representation learning in trajectory prediction, with substantial implications for data efficiency and cross-domain generalization in autonomous driving.

Abstract

Accumulating substantial volumes of real-world driving data proves pivotal in the realm of trajectory forecasting for autonomous driving. Given the heavy reliance of current trajectory forecasting models on data-driven methodologies, we aim to tackle the challenge of learning general trajectory forecasting representations under limited data availability. We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. The solution is composed of two parts: firstly, we adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. Specifically, we apply vector transformations to reshape the maps, and then employ a rule-based model to generate trajectories on both original and augmented scenes; thus enlarging the driving data without collecting additional real ones. To foster the learning of general representations within this augmented dataset, we comprehensively explore the different pre-training strategies, including extending the concept of a Masked AutoEncoder (MAE) for trajectory forecasting. Without bells and whistles, our proposed pipeline-level solution is general, simple, yet effective: we conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies, which outperform the baseline prediction model by large margins, e.g. 5.04%, 3.84% and 8.30% in terms of , and . The pre-training dataset and the codes for pre-training and fine-tuning are released at https://github.com/yhli123/Pretraining_on_Synthetic_Driving_Data_for_Trajectory_Prediction.
Paper Structure (22 sections, 7 equations, 4 figures, 7 tables)

This paper contains 22 sections, 7 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Our data synthesis and self-supervised pre-training pipeline enhances prediction performance without extra real-world driving data. Each circle's area is proportional to the total number of synthetic and real driving scenes used, as indicated within. (A lower $minFDE_6$ is preferable.)
  • Figure 2: Pipeline of driving data synthesis and utilization. We augment the map and generate trajectory on it to acquire synthetic motion data, which are then used for pre-training via masking and reconstruction. The pre-trained model is used to initialize the backbone model for fine-tuning.
  • Figure 3: Data distribution comparison between synthetic dataset (blue) and the real-world dataset (red) in terms of the speed and direction properties. (a) represents the map representation for MIA city. (b) represents the trajectory distribution of scenes in (a), showing a pattern of divergence in velocity properties. (c) represents the trajectory distribution of scenes in (b) rotated to the same initial direction, demonstrating a pattern of divergence in direction properties. (d)(e)(f) are the counterparts of (a)(b)(c) in PIT city.
  • Figure 4: Performance comparison without or with pre-training. The gray lines indicate lane boundaries. The green line and star indicate the true trajectory and its last point, while orange ones are the predicted ones. The orange background shows the possibility of each point being the predicted last point of the trajectory. (a) and (b) show the prediction results without or with trajectory pre-training. (c) and (d) illustrate the performance without or with map pre-training.