TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding
Zhejun Zhang, Christos Sakaridis, Luc Van Gool
TL;DR
TrafficBots V1.5 advances closed-loop multi-agent traffic simulation by integrating a CVAE-conditioned TrafficBots policy with the HPTR Transformer framework using a pairwise-relative, Knarpe-based representation. It removes temporal RNNs in favor of stacked history and relative pose encoding, achieving scalable, multi-agent forecasting while conditioning behavior on per-agent destinations and personalities. Training employs scheduled sampling and a KL regularization strategy with free nats, and inference includes scenario filtering to reduce collisions, yielding baseline realism but lagging behind GPT-based approaches in key metrics. The work provides a solid, extensible baseline and highlights practical trade-offs between realism, collision avoidance, and scalability in traffic simulation.
Abstract
In this technical report we present TrafficBots V1.5, a baseline method for the closed-loop simulation of traffic agents. TrafficBots V1.5 achieves baseline-level performance and a 3rd place ranking in the Waymo Open Sim Agents Challenge (WOSAC) 2024. It is a simple baseline that combines TrafficBots, a CVAE-based multi-agent policy conditioned on each agent's individual destination and personality, and HPTR, the heterogeneous polyline transformer with relative pose encoding. To improve the performance on the WOSAC leaderboard, we apply scheduled teacher-forcing at the training time and we filter the sampled scenarios at the inference time. The code is available at https://github.com/zhejz/TrafficBotsV1.5.
