Table of Contents
Fetching ...

Pedestrian Trajectory Prediction Based on Social Interactions Learning With Random Weights

Jiajia Xie, Sheng Zhang, Beihao Xia, Zhu Xiao, Hongbo Jiang, Siwang Zhou, Zheng Qin, Hongyang Chen

TL;DR

Pedestrian trajectory prediction in crowded scenes is challenged by implicit social interactions that are not well captured by fixed edge weights. We propose DTGAN, which extends GANs to graph sequence data with random edge weights to automatically learn interactions and produce multi-modal future trajectories; the Generator uses SPE, GAT with random weights, a Temporal Convolutional Network, and CNN-based decoding, while the Discriminator uses SPE-LSTM-FC for realism scoring. We explore multiple task losses (MSE, Gaussian NLL, Uniform likelihood) in conjunction with the WGAN objective to balance realism and diversity, and demonstrate state-of-the-art ADE/FDE and AMD/AMV on ETH/UCY datasets, with DTGAN-G achieving best distributional metrics. DTGAN also shows robustness to random weight initializations and benefits from ablations that highlight the value of graph-based attention and temporal modeling, suggesting practical impact for safer autonomous navigation in dynamic environments.

Abstract

Pedestrian trajectory prediction is a critical technology in the evolution of self-driving cars toward complete artificial intelligence. Over recent years, focusing on the trajectories of pedestrians to model their social interactions has surged with great interest in more accurate trajectory predictions. However, existing methods for modeling pedestrian social interactions rely on pre-defined rules, struggling to capture non-explicit social interactions. In this work, we propose a novel framework named DTGAN, which extends the application of Generative Adversarial Networks (GANs) to graph sequence data, with the primary objective of automatically capturing implicit social interactions and achieving precise predictions of pedestrian trajectory. DTGAN innovatively incorporates random weights within each graph to eliminate the need for pre-defined interaction rules. We further enhance the performance of DTGAN by exploring diverse task loss functions during adversarial training, which yields improvements of 16.7\% and 39.3\% on metrics ADE and FDE, respectively. The effectiveness and accuracy of our framework are verified on two public datasets. The experimental results show that our proposed DTGAN achieves superior performance and is well able to understand pedestrians' intentions.

Pedestrian Trajectory Prediction Based on Social Interactions Learning With Random Weights

TL;DR

Pedestrian trajectory prediction in crowded scenes is challenged by implicit social interactions that are not well captured by fixed edge weights. We propose DTGAN, which extends GANs to graph sequence data with random edge weights to automatically learn interactions and produce multi-modal future trajectories; the Generator uses SPE, GAT with random weights, a Temporal Convolutional Network, and CNN-based decoding, while the Discriminator uses SPE-LSTM-FC for realism scoring. We explore multiple task losses (MSE, Gaussian NLL, Uniform likelihood) in conjunction with the WGAN objective to balance realism and diversity, and demonstrate state-of-the-art ADE/FDE and AMD/AMV on ETH/UCY datasets, with DTGAN-G achieving best distributional metrics. DTGAN also shows robustness to random weight initializations and benefits from ablations that highlight the value of graph-based attention and temporal modeling, suggesting practical impact for safer autonomous navigation in dynamic environments.

Abstract

Pedestrian trajectory prediction is a critical technology in the evolution of self-driving cars toward complete artificial intelligence. Over recent years, focusing on the trajectories of pedestrians to model their social interactions has surged with great interest in more accurate trajectory predictions. However, existing methods for modeling pedestrian social interactions rely on pre-defined rules, struggling to capture non-explicit social interactions. In this work, we propose a novel framework named DTGAN, which extends the application of Generative Adversarial Networks (GANs) to graph sequence data, with the primary objective of automatically capturing implicit social interactions and achieving precise predictions of pedestrian trajectory. DTGAN innovatively incorporates random weights within each graph to eliminate the need for pre-defined interaction rules. We further enhance the performance of DTGAN by exploring diverse task loss functions during adversarial training, which yields improvements of 16.7\% and 39.3\% on metrics ADE and FDE, respectively. The effectiveness and accuracy of our framework are verified on two public datasets. The experimental results show that our proposed DTGAN achieves superior performance and is well able to understand pedestrians' intentions.
Paper Structure (21 sections, 17 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 17 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Pedestrian interactions graph representation. We model pedestrian trajectories at each time $t$ as graph-structured data, using different colored nodes to represent different pedestrians and edges to represent social interactions among pedestrians.
  • Figure 2: The DTGAN framework consists of a Generator and a Discriminator. Spatial Embedding Layer (SPE) is used to embed node features, and Random Weights (RW) to randomly weight the adjacency matrix for each graph. The Generator takes a set of graphs with node features and a random weights matrix as input. It utilizes the Graph Attention Network (GAT) to capture hidden node features and learn social interactions among pedestrians in the scene. Additionally, the Temporal Convolutional Network (TCN) with temporal dimension as the input channel extracts time sequence information, and the multi-layer Convolutional Neural Networks (CNNs) predict future trajectories. Finally, a decoder is employed to obtain future trajectories. On the other hand, the Discriminator assesses both ground truth and predicted trajectories as input, classifying them as real or fake.
  • Figure 3: Illustration of single trajectory prediction. We use the coordinate system to represent the plane position of trajectory points, y is the vertical axis coordinate, x is the horizontal axis coordinate, and the trajectory point represents each moment. Two models use the best amongst 20 samples for evaluation. Note that the coordinate origin in each subplot is not exactly the same, and the intersection point does not necessarily collide.
  • Figure 4: Illustration of trajectory distribution prediction. Each pedestrian is assigned various colors. The colored area of the ellipse represents the probability density distribution from the prediction. The wider the area of the ellipse, the greater the variance.
  • Figure 5: Average ADE/FDE of all datasets for different generated ways of weights.
  • ...and 2 more figures