Table of Contents
Fetching ...

PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting

Yihong Xu, Yuan Yin, Éloi Zablocki, Tuan-Hung Vu, Alexandre Boulch, Matthieu Cord

TL;DR

PPT tackles the high cost and domain sensitivity of motion forecasting by pretraining on pseudo-labeled trajectories generated from off-the-shelf detectors and non-learning trackers, embracing noise and diversity as regularizers. The method pretrains forecasting models on large, automated, multi-source data and optionally finetunes on a smaller set of labeled data, delivering strong gains in annotation-efficient settings and across cross-domain, end-to-end, and multi-class benchmarks. Key findings show improved generalization, faster finetuning convergence, and scalable benefits from aggregating diverse pseudo-labels. This approach offers a practical path to robust motion forecasting in varied driving contexts without heavy manual annotation or post-processing.

Abstract

Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories. However, building these datasets is costly, generally manual, hard to scale, and lacks reproducibility. They also introduce domain gaps that limit generalization across environments. We introduce PPT (Pretraining with Pseudo-labeled Trajectories), a simple and scalable alternative that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. Unlike traditional pipelines aiming for clean, single-label annotations, PPT embraces noise and diversity as useful signals for learning robust representations. With optional finetuning on a small amount of labeled data, models pretrained with PPT achieve strong performance across standard benchmarks particularly in low-data regimes, and in cross-domain, end-to-end and multi-class settings. PPT is easy to implement and improves generalization in motion forecasting. Code and data will be released upon acceptance.

PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting

TL;DR

PPT tackles the high cost and domain sensitivity of motion forecasting by pretraining on pseudo-labeled trajectories generated from off-the-shelf detectors and non-learning trackers, embracing noise and diversity as regularizers. The method pretrains forecasting models on large, automated, multi-source data and optionally finetunes on a smaller set of labeled data, delivering strong gains in annotation-efficient settings and across cross-domain, end-to-end, and multi-class benchmarks. Key findings show improved generalization, faster finetuning convergence, and scalable benefits from aggregating diverse pseudo-labels. This approach offers a practical path to robust motion forecasting in varied driving contexts without heavy manual annotation or post-processing.

Abstract

Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories. However, building these datasets is costly, generally manual, hard to scale, and lacks reproducibility. They also introduce domain gaps that limit generalization across environments. We introduce PPT (Pretraining with Pseudo-labeled Trajectories), a simple and scalable alternative that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. Unlike traditional pipelines aiming for clean, single-label annotations, PPT embraces noise and diversity as useful signals for learning robust representations. With optional finetuning on a small amount of labeled data, models pretrained with PPT achieve strong performance across standard benchmarks particularly in low-data regimes, and in cross-domain, end-to-end and multi-class settings. PPT is easy to implement and improves generalization in motion forecasting. Code and data will be released upon acceptance.

Paper Structure

This paper contains 24 sections, 2 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: No pretraining (gray) vs. Pretraining (orange) with Pseudo-labeled Trajectories (PPT). PPT improves the motion forecasting performance (e.g., MissRate↓) through pretraining with pseudo-labeled trajectories, especially well in annotation-efficient regimes where only 1$\sim$10% of the ground-truth labeled trajectories are used for finetuning.
  • Figure 2: Illustration of PPT. On the left, we show the conventional approach to training a motion forecasting model from scratch with annotated labels, often from human curation. By contrast, on the right, we present a pretraining pipeline with pseudo-labeled trajectories from different 3D detectors and (non-learning) trackers ($\mathit{Dtr}, \mathit{Trk}$), followed by an optional finetuning phase (shaded in gray).
  • Figure 3: Diverse pseudo-labeled trajectories. Compared to a single curated annotation in WOD waymo, with pseudo-labeled trajectories, we provide not only the training example close to the ground truth but other feasible trajectories.
  • Figure 4: Data diversity in pretraining matters. MTR models are pretrained without finetuning with exactly 15,290 samples from 100 scenarios of WOD waymo. These samples come from one, two or four detection models. All models are evaluated on the full validation set of WOD.
  • Figure 5: Efficient finetuning. We show the evolution of $\text{brier-FDE}$ on the validation sets during training from scratch/finetuning w/ PPT the forecasting MTR model. Model pretrained with PPT converges faster and achieves better results (indicated by arrows).
  • ...and 4 more figures