SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction
Zixu Wang, Zhigang Sun, Juergen Luettin, Lavdim Halilaj
TL;DR
SocialFormer addresses autonomous driving trajectory prediction by modeling rich social interactions and road topology. It introduces an edge-enhanced heterogeneous graph transformer (EHGT) to encode edge attributes within a heterogeneous scene graph, coupled with a GRU-based temporal encoder and a four-part information fusion module to form a comprehensive scene representation. A multimodal trajectory predictor samples multiple future paths using a Gaussian latent variable $z$ and produces $k$ trajectories $\,\hat{Y}_{1:t_f}^{k}$, complemented by a graph-based prediction $\,\tilde{Y}_{1:t_f}^{k}$; losses combine $r$ and $r$ under $m$. Experiments on the nuScenes benchmark demonstrate state-of-the-art accuracy, including robustness in scenes with sparse semantic relations, underscoring the value of explicit agent interactions and lane topology in real-world driving settings.
Abstract
Accurate trajectory prediction is crucial for ensuring safe and efficient autonomous driving. However, most existing methods overlook complex interactions between traffic participants that often govern their future trajectories. In this paper, we propose SocialFormer, an agent interaction-aware trajectory prediction method that leverages the semantic relationship between the target vehicle and surrounding vehicles by making use of the road topology. We also introduce an edge-enhanced heterogeneous graph transformer (EHGT) as the aggregator in a graph neural network (GNN) to encode the semantic and spatial agent interaction information. Additionally, we introduce a temporal encoder based on gated recurrent units (GRU) to model the temporal social behavior of agent movements. Finally, we present an information fusion framework that integrates agent encoding, lane encoding, and agent interaction encoding for a holistic representation of the traffic scene. We evaluate SocialFormer for the trajectory prediction task on the popular nuScenes benchmark and achieve state-of-the-art performance.
