Table of Contents
Fetching ...

ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction

Ruochen Li, Zhanxing Zhu, Tanqiu Qiao, Hubert P. H. Shum

TL;DR

ViTE tackles pedestrian trajectory prediction by combining two key innovations: a Virtual Graph with learnable virtual nodes to efficiently capture long-range, high-order interactions, and a Mixture-of-Experts based Expert Router that adaptively selects between one-hop and high-order interaction models based on social context. The framework enables context-aware, scalable reasoning without relying on deep GNN stacks, achieving state-of-the-art results on ETH/UCY, NBA, and SDD while maintaining high efficiency (low MACs and parameters). The approach is validated through extensive ablations showing the benefits of adaptive expert routing and virtual nodes, with qualitative analyses illustrating dynamic expert weights aligned to scene complexity. Overall, ViTE offers a practical and effective solution for multi-agent trajectory reasoning with improved generalization and efficiency across diverse settings.

Abstract

Pedestrian trajectory prediction is critical for ensuring safety in autonomous driving, surveillance systems, and urban planning applications. While early approaches primarily focus on one-hop pairwise relationships, recent studies attempt to capture high-order interactions by stacking multiple Graph Neural Network (GNN) layers. However, these approaches face a fundamental trade-off: insufficient layers may lead to under-reaching problems that limit the model's receptive field, while excessive depth can result in prohibitive computational costs. We argue that an effective model should be capable of adaptively modeling both explicit one-hop interactions and implicit high-order dependencies, rather than relying solely on architectural depth. To this end, we propose ViTE (Virtual graph Trajectory Expert router), a novel framework for pedestrian trajectory prediction. ViTE consists of two key modules: a Virtual Graph that introduces dynamic virtual nodes to model long-range and high-order interactions without deep GNN stacks, and an Expert Router that adaptively selects interaction experts based on social context using a Mixture-of-Experts design. This combination enables flexible and scalable reasoning across varying interaction patterns. Experiments on three benchmarks (ETH/UCY, NBA, and SDD) demonstrate that our method consistently achieves state-of-the-art performance, validating both its effectiveness and practical efficiency.

ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction

TL;DR

ViTE tackles pedestrian trajectory prediction by combining two key innovations: a Virtual Graph with learnable virtual nodes to efficiently capture long-range, high-order interactions, and a Mixture-of-Experts based Expert Router that adaptively selects between one-hop and high-order interaction models based on social context. The framework enables context-aware, scalable reasoning without relying on deep GNN stacks, achieving state-of-the-art results on ETH/UCY, NBA, and SDD while maintaining high efficiency (low MACs and parameters). The approach is validated through extensive ablations showing the benefits of adaptive expert routing and virtual nodes, with qualitative analyses illustrating dynamic expert weights aligned to scene complexity. Overall, ViTE offers a practical and effective solution for multi-agent trajectory reasoning with improved generalization and efficiency across diverse settings.

Abstract

Pedestrian trajectory prediction is critical for ensuring safety in autonomous driving, surveillance systems, and urban planning applications. While early approaches primarily focus on one-hop pairwise relationships, recent studies attempt to capture high-order interactions by stacking multiple Graph Neural Network (GNN) layers. However, these approaches face a fundamental trade-off: insufficient layers may lead to under-reaching problems that limit the model's receptive field, while excessive depth can result in prohibitive computational costs. We argue that an effective model should be capable of adaptively modeling both explicit one-hop interactions and implicit high-order dependencies, rather than relying solely on architectural depth. To this end, we propose ViTE (Virtual graph Trajectory Expert router), a novel framework for pedestrian trajectory prediction. ViTE consists of two key modules: a Virtual Graph that introduces dynamic virtual nodes to model long-range and high-order interactions without deep GNN stacks, and an Expert Router that adaptively selects interaction experts based on social context using a Mixture-of-Experts design. This combination enables flexible and scalable reasoning across varying interaction patterns. Experiments on three benchmarks (ETH/UCY, NBA, and SDD) demonstrate that our method consistently achieves state-of-the-art performance, validating both its effectiveness and practical efficiency.

Paper Structure

This paper contains 25 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison of interaction modeling strategies. (a) Traditional methods capture only one-hop interactions. (b) Existing methods stack multiple GNN layers to model high-order dependencies. (c) Our method introduces virtual nodes to capture high-order interactions efficiently.
  • Figure 2: Overview of ViTE. Given pedestrian trajectories, we first construct interaction graphs. In (a), the high-order interaction expert captures indirect, long-range dependencies via Virtual Graph Learning, while (b) illustrates the one-hop expert modeling direct interactions. These expert outputs are then dynamically fused by a MoE-based Expert Router, as depicted in (c), enabling context-aware routing of graph information. Finally, an MLP-based decoder outputs future trajectories for each pedestrian.
  • Figure 3: Comparison of effective resistance $(R_{ae})$ between a standard chain graph (left, $R_{ae} = 4.0$) and a virtual-node-enhanced graph structure (right, $R_{ae} = 1.2$). Lower effective resistance indicates more efficient message propagation and improved global connectivity.
  • Figure 4: Qualitative results on ETH/UCY datasets. Historical trajectories are in blue, ground-truth trajectories are in red, and predicted trajectories are in green.
  • Figure 5: Qualitative results on NBA datasets. Historical trajectories are in blue, ground-truth trajectories are in red, and predicted trajectories are in green.
  • ...and 1 more figures