Table of Contents
Fetching ...

Distilling Knowledge for Short-to-Long Term Trajectory Prediction

Sourav Das, Guglielmo Camporese, Shaokang Cheng, Lamberto Ballan

TL;DR

The proposed Di-Long method is effective for long-term forecasting and achieves state-of-the-art performance on the Intersection Drone Dataset (inD) and the Stanford Drone Dataset (SDD).

Abstract

Long-term trajectory forecasting is an important and challenging problem in the fields of computer vision, machine learning, and robotics. One fundamental difficulty stands in the evolution of the trajectory that becomes more and more uncertain and unpredictable as the time horizon grows, subsequently increasing the complexity of the problem. To overcome this issue, in this paper, we propose Di-Long, a new method that employs the distillation of a short-term trajectory model forecaster that guides a student network for long-term trajectory prediction during the training process. Given a total sequence length that comprehends the allowed observation for the student network and the complementary target sequence, we let the student and the teacher solve two different related tasks defined over the same full trajectory: the student observes a short sequence and predicts a long trajectory, whereas the teacher observes a longer sequence and predicts the remaining short target trajectory. The teacher's task is less uncertain, and we use its accurate predictions to guide the student through our knowledge distillation framework, reducing long-term future uncertainty. Our experiments show that our proposed Di-Long method is effective for long-term forecasting and achieves state-of-the-art performance on the Intersection Drone Dataset (inD) and the Stanford Drone Dataset (SDD).

Distilling Knowledge for Short-to-Long Term Trajectory Prediction

TL;DR

The proposed Di-Long method is effective for long-term forecasting and achieves state-of-the-art performance on the Intersection Drone Dataset (inD) and the Stanford Drone Dataset (SDD).

Abstract

Long-term trajectory forecasting is an important and challenging problem in the fields of computer vision, machine learning, and robotics. One fundamental difficulty stands in the evolution of the trajectory that becomes more and more uncertain and unpredictable as the time horizon grows, subsequently increasing the complexity of the problem. To overcome this issue, in this paper, we propose Di-Long, a new method that employs the distillation of a short-term trajectory model forecaster that guides a student network for long-term trajectory prediction during the training process. Given a total sequence length that comprehends the allowed observation for the student network and the complementary target sequence, we let the student and the teacher solve two different related tasks defined over the same full trajectory: the student observes a short sequence and predicts a long trajectory, whereas the teacher observes a longer sequence and predicts the remaining short target trajectory. The teacher's task is less uncertain, and we use its accurate predictions to guide the student through our knowledge distillation framework, reducing long-term future uncertainty. Our experiments show that our proposed Di-Long method is effective for long-term forecasting and achieves state-of-the-art performance on the Intersection Drone Dataset (inD) and the Stanford Drone Dataset (SDD).
Paper Structure (13 sections, 9 equations, 6 figures, 4 tables)

This paper contains 13 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Di-Long Framework: at the bottom, we depict the full trajectory from which the observations and targets of the student and the teacher are extracted. On top we show the components of our framework: the student transformer decoder predicts the long-term trajectory that is distilled from the teacher's prediction (based on longer sequences). The student's decoder is conditioned on the student goal module, distilled from a teacher's goal module.
  • Figure 2: Detailed overview of the Di-Long Model Components. Di-Long is composed by a student and the teacher both having a goal module and a temporal module. The student processes short observations and predicts long predictions, the teacher observes long trajectories and predicts short ones. The goal modules processes 2D encoded sequences and semantic maps, producing goal and waypoints heatmaps. The temporal modules, given the observed trajectory, the goals, the semantic maps, and the social information, predict the future trajectory. The distillation is done both in the goal and in the temporal modules. See Sec. \ref{['sec:our_method']} for more details.
  • Figure 3: Ablation Study on the Di-Long Components. In GW-Heatmap, a 2D Gaussian heatmap of the goal/waypoint (GW) is appended to the semantic map, patchified, projected, and passed as the control input of the transformer decoder. GM Distill corresponds to goal module distillation, while TM Distill corresponds to temporal module distillation, respectively.
  • Figure 4: Performance Across Longer Time Horizons. These results are obtained on the inD dataset.
  • Figure 5: Optimal Teacher Observation Length. These results are obtained on the inD dataset.
  • ...and 1 more figures