Social-IWSTCNN: A Social Interaction-Weighted Spatio-Temporal Convolutional Neural Network for Pedestrian Trajectory Prediction in Urban Traffic Scenarios
Chi Zhang, Christian Berger, Marco Dozza
TL;DR
The paper addresses pedestrian trajectory prediction in urban traffic by learning data-driven social interaction weights from relative positions using a Social Interaction Extractor. It introduces Social-IWSTCNN, a model that combines spatial-social feature extraction, Temporal Convolutional Networks, and a Time-Extrapolator CNN to predict a bi-variate Gaussian distribution for each pedestrian, with parameters $(\mu_x, \mu_y, \sigma_x, \sigma_y, \rho)$. On the Waymo Open Dataset, it outperforms state-of-the-art methods such as Social-LSTM, Social-GAN, and Social-STGCNN in ADE and FDE, while delivering substantial speedups in data preprocessing (≈$54.8\times$) and total test time (≈$4.7\times$). The work demonstrates robust performance in densely populated urban scenarios and highlights future opportunities to incorporate vehicle and environment cues to further improve prediction, especially in sparser contexts.
Abstract
Pedestrian trajectory prediction in urban scenarios is essential for automated driving. This task is challenging because the behavior of pedestrians is influenced by both their own history paths and the interactions with others. Previous research modeled these interactions with pooling mechanisms or aggregating with hand-crafted attention weights. In this paper, we present the Social Interaction-Weighted Spatio-Temporal Convolutional Neural Network (Social-IWSTCNN), which includes both the spatial and the temporal features. We propose a novel design, namely the Social Interaction Extractor, to learn the spatial and social interaction features of pedestrians. Most previous works used ETH and UCY datasets which include five scenes but do not cover urban traffic scenarios extensively for training and evaluation. In this paper, we use the recently released large-scale Waymo Open Dataset in urban traffic scenarios, which includes 374 urban training scenes and 76 urban testing scenes to analyze the performance of our proposed algorithm in comparison to the state-of-the-art (SOTA) models. The results show that our algorithm outperforms SOTA algorithms such as Social-LSTM, Social-GAN, and Social-STGCNN on both Average Displacement Error (ADE) and Final Displacement Error (FDE). Furthermore, our Social-IWSTCNN is 54.8 times faster in data pre-processing speed, and 4.7 times faster in total test speed than the current best SOTA algorithm Social-STGCNN.
