Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller
TL;DR
This work tackles the dual challenge of realism and efficiency in multi-agent driving simulation by adopting instance-centric observations, where each entity has its own local frame, enabling viewpoint-invariant encoding and token reuse for static map elements. A query-centric symmetric context encoder with relative pose encodings models inter-agent and inter-element interactions, while Adversarial Inverse Reinforcement Learning provides a learnable reward signal, stabilized by an adaptive reward transformation. The approach demonstrates strong robustness and scalability, outperforming agent-centric baselines on two driving datasets and achieving faster training and inference through reusable map tokens and restricted interaction scopes. The results also show improved cross-dataset generalization, highlighting the method’s potential for scalable, realistic simulation in varied traffic scenarios.
Abstract
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.
