TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

Zhejun Zhang; Christos Sakaridis; Luc Van Gool

TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

Zhejun Zhang, Christos Sakaridis, Luc Van Gool

TL;DR

TrafficBots V1.5 advances closed-loop multi-agent traffic simulation by integrating a CVAE-conditioned TrafficBots policy with the HPTR Transformer framework using a pairwise-relative, Knarpe-based representation. It removes temporal RNNs in favor of stacked history and relative pose encoding, achieving scalable, multi-agent forecasting while conditioning behavior on per-agent destinations and personalities. Training employs scheduled sampling and a KL regularization strategy with free nats, and inference includes scenario filtering to reduce collisions, yielding baseline realism but lagging behind GPT-based approaches in key metrics. The work provides a solid, extensible baseline and highlights practical trade-offs between realism, collision avoidance, and scalability in traffic simulation.

Abstract

In this technical report we present TrafficBots V1.5, a baseline method for the closed-loop simulation of traffic agents. TrafficBots V1.5 achieves baseline-level performance and a 3rd place ranking in the Waymo Open Sim Agents Challenge (WOSAC) 2024. It is a simple baseline that combines TrafficBots, a CVAE-based multi-agent policy conditioned on each agent's individual destination and personality, and HPTR, the heterogeneous polyline transformer with relative pose encoding. To improve the performance on the WOSAC leaderboard, we apply scheduled teacher-forcing at the training time and we filter the sampled scenarios at the inference time. The code is available at https://github.com/zhejz/TrafficBotsV1.5.

TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

TL;DR

Abstract

Paper Structure (10 sections, 2 equations, 1 figure, 1 table)

This paper contains 10 sections, 2 equations, 1 figure, 1 table.

Introduction
TrafficBots
HPTR
Method
Architecture
Training
Inference
Implementation details
Results
Conclusion

Figures (1)

Figure 1: Network architecture of TrafficBots V1.5. In the brackets are the tensor shapes, where the hidden dimensions are omitted for conciseness. $B$ is the batch size, which is also the number of episodes. $N_\text{M}, N_\text{C}, N_\text{A}$ are, respectively, the number of map polylines, traffic light polylines and agent trajectories. $N_\text{node}$ is the number of segments in each polyline. $T$ is the length of the stacked historical observations. The destination predictor and personality predictor are not visualized. They have a similar structure to the policy network.

TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

TL;DR

Abstract

TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

Authors

TL;DR

Abstract

Table of Contents

Figures (1)