Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction
Ruochen Li, Tanqiu Qiao, Stamos Katsigiannis, Zhanxing Zhu, Hubert P. H. Shum
TL;DR
UniEdge tackles pedestrian trajectory prediction by unifying spatial-temporal interactions into a single step, alleviating information loss from multi-step aggregations. It introduces a dual-graph E2E-N2N-GCN to jointly model explicit node interactions and implicit edge-to-edge influence, followed by a transformer encoder predictor to capture global temporal dependencies. The approach yields state-of-the-art or highly competitive results on ETH/UCY and SDD datasets, with comprehensive ablations validating the contribution of each component. This work enhances prediction accuracy in both sparse and dense crowd scenes and offers practical benefits for safety-critical systems that rely on accurate pedestrian motion forecasting. The combination of unified ST graphs, edge-aware propagation, and global temporal modeling provides a robust, scalable framework for real-time trajectory prediction in complex environments.
Abstract
Pedestrian trajectory prediction aims to forecast future movements based on historical paths. Spatial-temporal (ST) methods often separately model spatial interactions among pedestrians and temporal dependencies of individuals. They overlook the direct impacts of interactions among different pedestrians across various time steps (i.e., high-order cross-time interactions). This limits their ability to capture ST inter-dependencies and hinders prediction performance. To address these limitations, we propose UniEdge with three major designs. Firstly, we introduce a unified ST graph data structure that simplifies high-order cross-time interactions into first-order relationships, enabling the learning of ST inter-dependencies in a single step. This avoids the information loss caused by multi-step aggregation. Secondly, traditional GNNs focus on aggregating pedestrian node features, neglecting the propagation of implicit interaction patterns encoded in edge features. We propose the Edge-to-Edge-Node-to-Node Graph Convolution (E2E-N2N-GCN), a novel dual-graph network that jointly models explicit N2N social interactions among pedestrians and implicit E2E influence propagation across these interaction patterns. Finally, to overcome the limited receptive fields and challenges in capturing long-range dependencies of auto-regressive architectures, we introduce a transformer encoder-based predictor that enables global modeling of temporal correlation. UniEdge outperforms state-of-the-arts on multiple datasets, including ETH, UCY, and SDD.
