Table of Contents
Fetching ...

Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction

Ruochen Li, Tanqiu Qiao, Stamos Katsigiannis, Zhanxing Zhu, Hubert P. H. Shum

TL;DR

UniEdge tackles pedestrian trajectory prediction by unifying spatial-temporal interactions into a single step, alleviating information loss from multi-step aggregations. It introduces a dual-graph E2E-N2N-GCN to jointly model explicit node interactions and implicit edge-to-edge influence, followed by a transformer encoder predictor to capture global temporal dependencies. The approach yields state-of-the-art or highly competitive results on ETH/UCY and SDD datasets, with comprehensive ablations validating the contribution of each component. This work enhances prediction accuracy in both sparse and dense crowd scenes and offers practical benefits for safety-critical systems that rely on accurate pedestrian motion forecasting. The combination of unified ST graphs, edge-aware propagation, and global temporal modeling provides a robust, scalable framework for real-time trajectory prediction in complex environments.

Abstract

Pedestrian trajectory prediction aims to forecast future movements based on historical paths. Spatial-temporal (ST) methods often separately model spatial interactions among pedestrians and temporal dependencies of individuals. They overlook the direct impacts of interactions among different pedestrians across various time steps (i.e., high-order cross-time interactions). This limits their ability to capture ST inter-dependencies and hinders prediction performance. To address these limitations, we propose UniEdge with three major designs. Firstly, we introduce a unified ST graph data structure that simplifies high-order cross-time interactions into first-order relationships, enabling the learning of ST inter-dependencies in a single step. This avoids the information loss caused by multi-step aggregation. Secondly, traditional GNNs focus on aggregating pedestrian node features, neglecting the propagation of implicit interaction patterns encoded in edge features. We propose the Edge-to-Edge-Node-to-Node Graph Convolution (E2E-N2N-GCN), a novel dual-graph network that jointly models explicit N2N social interactions among pedestrians and implicit E2E influence propagation across these interaction patterns. Finally, to overcome the limited receptive fields and challenges in capturing long-range dependencies of auto-regressive architectures, we introduce a transformer encoder-based predictor that enables global modeling of temporal correlation. UniEdge outperforms state-of-the-arts on multiple datasets, including ETH, UCY, and SDD.

Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction

TL;DR

UniEdge tackles pedestrian trajectory prediction by unifying spatial-temporal interactions into a single step, alleviating information loss from multi-step aggregations. It introduces a dual-graph E2E-N2N-GCN to jointly model explicit node interactions and implicit edge-to-edge influence, followed by a transformer encoder predictor to capture global temporal dependencies. The approach yields state-of-the-art or highly competitive results on ETH/UCY and SDD datasets, with comprehensive ablations validating the contribution of each component. This work enhances prediction accuracy in both sparse and dense crowd scenes and offers practical benefits for safety-critical systems that rely on accurate pedestrian motion forecasting. The combination of unified ST graphs, edge-aware propagation, and global temporal modeling provides a robust, scalable framework for real-time trajectory prediction in complex environments.

Abstract

Pedestrian trajectory prediction aims to forecast future movements based on historical paths. Spatial-temporal (ST) methods often separately model spatial interactions among pedestrians and temporal dependencies of individuals. They overlook the direct impacts of interactions among different pedestrians across various time steps (i.e., high-order cross-time interactions). This limits their ability to capture ST inter-dependencies and hinders prediction performance. To address these limitations, we propose UniEdge with three major designs. Firstly, we introduce a unified ST graph data structure that simplifies high-order cross-time interactions into first-order relationships, enabling the learning of ST inter-dependencies in a single step. This avoids the information loss caused by multi-step aggregation. Secondly, traditional GNNs focus on aggregating pedestrian node features, neglecting the propagation of implicit interaction patterns encoded in edge features. We propose the Edge-to-Edge-Node-to-Node Graph Convolution (E2E-N2N-GCN), a novel dual-graph network that jointly models explicit N2N social interactions among pedestrians and implicit E2E influence propagation across these interaction patterns. Finally, to overcome the limited receptive fields and challenges in capturing long-range dependencies of auto-regressive architectures, we introduce a transformer encoder-based predictor that enables global modeling of temporal correlation. UniEdge outperforms state-of-the-arts on multiple datasets, including ETH, UCY, and SDD.

Paper Structure

This paper contains 30 sections, 13 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: Motivation Illustration. (a)Real-world pedestrian trajectories over multiple time frames. (b)Existing ST approaches separately model the spatial interactions among pedestrians and temporal dependencies of individuals. (c)Our unified ST graph integrates ST inter-dependencies and simplifies high-order cross-time interactions into first-order relationships.
  • Figure 2: Illustration of graph learning procedures. (a) Node-to-Node (N2N), (b) Edge-to-Node (E2N), and (c) Our novel dual-graph introduces the combination of N2N and Edge-to-Edge (E2E) paradigm.
  • Figure 3: Overview of the proposed UniEdge. (a) Construction of patch-based unified ST graphs that simplify cross-time interactions into first-order relationships, (b) Edge-to-Edge-Node-to-Node Graph Convolution (E2E-N2N-GCN) that jointly processes N2N interactions and E2E influence propagation, and (c) Transformer Encoder-based trajectory predictor.
  • Figure 4: Comparison of effective resistance $(R_{ij})$ between traditional ST approach (left, $R_{ij} = 1.50$) and our unified ST graph (right, $R_{ij} = 0.27$). Lower $R_{ij}$ indicates better message propagation efficiency.
  • Figure 5: Illustration of edge graph construction from a unified ST graph using the first-order boundary operator $\mathcal{B}_{1}$. Nodes are represented by numbers, and edges connecting these nodes are labeled with letters. Applying the first-order boundary operator transforms each edge into a node in the edge graph, with connections formed based on shared nodes in the original graph.
  • ...and 7 more figures