Table of Contents
Fetching ...

Improving Partially Observed Trajectories Forecasting by Target-driven Self-Distillation

Peng Shu, Pengfei Zhu, Mengshi Qi, Liang Liu

TL;DR

The paper tackles robust motion forecasting under partial observations by introducing Target-driven Self-Distillation (TSD), a single-stage end-to-end framework. It combines an encoder, a Transformer-based anchor-free target point generator, and a multi-modal trajectory predictor guided by sequential targets. A self-distillation loss based on Maximum Mean Discrepancy aligns feature distributions between fully and partially observed inputs, enabling robust predictions without extra parameters. Experiments across Argoverse and NuScenes show improved robustness under partial observation and maintained or enhanced performance under full observation, with efficiency gains over distillation-based baselines. The work offers a practical, plug-and-play approach for real-world autonomous driving systems and provides code and checkpoints for reproducibility.

Abstract

Accurate prediction of future trajectories of traffic agents is essential for ensuring safe autonomous driving. However, partially observed trajectories can significantly degrade the performance of even state-of-the-art models. Previous approaches often rely on knowledge distillation to transfer features from fully observed trajectories to partially observed ones. This involves firstly training a fully observed model and then using a distillation process to create the final model. While effective, they require multi-stage training, making the training process very expensive. Moreover, knowledge distillation can lead to a performance degradation of the model. In this paper, we introduce a Target-drivenSelf-Distillation method (TSD) for motion forecasting. Our method leverages predicted accurate targets to guide the model in making predictions under partial observation conditions. By employing self-distillation, the model learns from the feature distributions of both fully observed and partially observed trajectories during a single end-to-end training process. This enhances the model's ability to predict motion accurately in both fully observed and partially observed scenarios. We evaluate our method on multiple datasets and state-of-the-art motion forecasting models. Extensive experimental results demonstrate that our approach achieves significant performance improvements in both settings. To facilitate further research, we will release our code and model checkpoints.

Improving Partially Observed Trajectories Forecasting by Target-driven Self-Distillation

TL;DR

The paper tackles robust motion forecasting under partial observations by introducing Target-driven Self-Distillation (TSD), a single-stage end-to-end framework. It combines an encoder, a Transformer-based anchor-free target point generator, and a multi-modal trajectory predictor guided by sequential targets. A self-distillation loss based on Maximum Mean Discrepancy aligns feature distributions between fully and partially observed inputs, enabling robust predictions without extra parameters. Experiments across Argoverse and NuScenes show improved robustness under partial observation and maintained or enhanced performance under full observation, with efficiency gains over distillation-based baselines. The work offers a practical, plug-and-play approach for real-world autonomous driving systems and provides code and checkpoints for reproducibility.

Abstract

Accurate prediction of future trajectories of traffic agents is essential for ensuring safe autonomous driving. However, partially observed trajectories can significantly degrade the performance of even state-of-the-art models. Previous approaches often rely on knowledge distillation to transfer features from fully observed trajectories to partially observed ones. This involves firstly training a fully observed model and then using a distillation process to create the final model. While effective, they require multi-stage training, making the training process very expensive. Moreover, knowledge distillation can lead to a performance degradation of the model. In this paper, we introduce a Target-drivenSelf-Distillation method (TSD) for motion forecasting. Our method leverages predicted accurate targets to guide the model in making predictions under partial observation conditions. By employing self-distillation, the model learns from the feature distributions of both fully observed and partially observed trajectories during a single end-to-end training process. This enhances the model's ability to predict motion accurately in both fully observed and partially observed scenarios. We evaluate our method on multiple datasets and state-of-the-art motion forecasting models. Extensive experimental results demonstrate that our approach achieves significant performance improvements in both settings. To facilitate further research, we will release our code and model checkpoints.

Paper Structure

This paper contains 13 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Motion forecasting models often encounter occluded historical trajectories. Our proposed TSD enhances robustness by leveraging self-distillation and predicting target to adapt to Partially Observation scenarios.
  • Figure 2: Overview of our proposed TSD. The entire training process is implemented end-to-end. We first apply random masking to the input trajectories to obtain partially observed trajectory branches, which are then fed into the network along with fully observed trajectories. Our proposed target generator produces sequential targets, which subsequently guide the trajectory prediction process. The features extracted from partially observed trajectories and fully observed trajectories are brought closer in distribution using MMD loss. The generation of partially observed trajectories occurs only during training, based on the original fully observed trajectories.
  • Figure 3: Qualitative results of HiVT and HiVT-TSD. The past trajectories are shown in yellow, the ground-truth trajectories are shown in red, and the predicted trajectories are shown in green. The white boxes are other vehicles around, and the red boxes are agents