Table of Contents
Fetching ...

Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Yu Liu, Yuexin Zhang, Kunming Li, Yongliang Qiao, Stewart Worrall, You-Fu Li, He Kong

TL;DR

This paper addresses cross-scene variability in pedestrian trajectory prediction by introducing a knowledge-aware graph transformer that models social interactions and temporal motion via spatial-temporal graphs and multi-head attention. It combines a spatial GNN and a temporal GNN with a time-extrapolator CNN, and trains with a hybrid loss that blends maximum likelihood with maximum mean discrepancy to align distributions across datasets. The approach achieves improved ADE/FDE and reduced prediction-robustness variance on ETH/UCY compared with strong baselines, demonstrating better generalization across scenes. The work advances practical trajectory prediction for autonomous systems by explicitly addressing domain heterogeneity and uncertainty through graph-based, attention-driven modeling.

Abstract

Predicting pedestrian motion trajectories is crucial for path planning and motion control of autonomous vehicles. Accurately forecasting crowd trajectories is challenging due to the uncertain nature of human motions in different environments. For training, recent deep learning-based prediction approaches mainly utilize information like trajectory history and interactions between pedestrians, among others. This can limit the prediction performance across various scenarios since the discrepancies between training datasets have not been properly incorporated. To overcome this limitation, this paper proposes a graph transformer structure to improve prediction performance, capturing the differences between the various sites and scenarios contained in the datasets. In particular, a self-attention mechanism and a domain adaption module have been designed to improve the generalization ability of the model. Moreover, an additional metric considering cross-dataset sequences is introduced for training and performance evaluation purposes. The proposed framework is validated and compared against existing methods using popular public datasets, i.e., ETH and UCY. Experimental results demonstrate the improved performance of our proposed scheme.

Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

TL;DR

This paper addresses cross-scene variability in pedestrian trajectory prediction by introducing a knowledge-aware graph transformer that models social interactions and temporal motion via spatial-temporal graphs and multi-head attention. It combines a spatial GNN and a temporal GNN with a time-extrapolator CNN, and trains with a hybrid loss that blends maximum likelihood with maximum mean discrepancy to align distributions across datasets. The approach achieves improved ADE/FDE and reduced prediction-robustness variance on ETH/UCY compared with strong baselines, demonstrating better generalization across scenes. The work advances practical trajectory prediction for autonomous systems by explicitly addressing domain heterogeneity and uncertainty through graph-based, attention-driven modeling.

Abstract

Predicting pedestrian motion trajectories is crucial for path planning and motion control of autonomous vehicles. Accurately forecasting crowd trajectories is challenging due to the uncertain nature of human motions in different environments. For training, recent deep learning-based prediction approaches mainly utilize information like trajectory history and interactions between pedestrians, among others. This can limit the prediction performance across various scenarios since the discrepancies between training datasets have not been properly incorporated. To overcome this limitation, this paper proposes a graph transformer structure to improve prediction performance, capturing the differences between the various sites and scenarios contained in the datasets. In particular, a self-attention mechanism and a domain adaption module have been designed to improve the generalization ability of the model. Moreover, an additional metric considering cross-dataset sequences is introduced for training and performance evaluation purposes. The proposed framework is validated and compared against existing methods using popular public datasets, i.e., ETH and UCY. Experimental results demonstrate the improved performance of our proposed scheme.
Paper Structure (20 sections, 13 equations, 2 figures, 4 tables)

This paper contains 20 sections, 13 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The structure of proposed method
  • Figure 2: Visualization of predicted trajectories distribution across five main scenes. For each example, 300 samples are recorded and their densities are visualized. Observed trajectories are marked as solid lines and the dashed lines are indicating the ground truth future path.