ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction
Ruochen Li, Zhanxing Zhu, Tanqiu Qiao, Hubert P. H. Shum
TL;DR
ViTE tackles pedestrian trajectory prediction by combining two key innovations: a Virtual Graph with learnable virtual nodes to efficiently capture long-range, high-order interactions, and a Mixture-of-Experts based Expert Router that adaptively selects between one-hop and high-order interaction models based on social context. The framework enables context-aware, scalable reasoning without relying on deep GNN stacks, achieving state-of-the-art results on ETH/UCY, NBA, and SDD while maintaining high efficiency (low MACs and parameters). The approach is validated through extensive ablations showing the benefits of adaptive expert routing and virtual nodes, with qualitative analyses illustrating dynamic expert weights aligned to scene complexity. Overall, ViTE offers a practical and effective solution for multi-agent trajectory reasoning with improved generalization and efficiency across diverse settings.
Abstract
Pedestrian trajectory prediction is critical for ensuring safety in autonomous driving, surveillance systems, and urban planning applications. While early approaches primarily focus on one-hop pairwise relationships, recent studies attempt to capture high-order interactions by stacking multiple Graph Neural Network (GNN) layers. However, these approaches face a fundamental trade-off: insufficient layers may lead to under-reaching problems that limit the model's receptive field, while excessive depth can result in prohibitive computational costs. We argue that an effective model should be capable of adaptively modeling both explicit one-hop interactions and implicit high-order dependencies, rather than relying solely on architectural depth. To this end, we propose ViTE (Virtual graph Trajectory Expert router), a novel framework for pedestrian trajectory prediction. ViTE consists of two key modules: a Virtual Graph that introduces dynamic virtual nodes to model long-range and high-order interactions without deep GNN stacks, and an Expert Router that adaptively selects interaction experts based on social context using a Mixture-of-Experts design. This combination enables flexible and scalable reasoning across varying interaction patterns. Experiments on three benchmarks (ETH/UCY, NBA, and SDD) demonstrate that our method consistently achieves state-of-the-art performance, validating both its effectiveness and practical efficiency.
