Table of Contents
Fetching ...

Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller

TL;DR

This work tackles the dual challenge of realism and efficiency in multi-agent driving simulation by adopting instance-centric observations, where each entity has its own local frame, enabling viewpoint-invariant encoding and token reuse for static map elements. A query-centric symmetric context encoder with relative pose encodings models inter-agent and inter-element interactions, while Adversarial Inverse Reinforcement Learning provides a learnable reward signal, stabilized by an adaptive reward transformation. The approach demonstrates strong robustness and scalability, outperforming agent-centric baselines on two driving datasets and achieving faster training and inference through reusable map tokens and restricted interaction scopes. The results also show improved cross-dataset generalization, highlighting the method’s potential for scalable, realistic simulation in varied traffic scenarios.

Abstract

Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.

Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

TL;DR

This work tackles the dual challenge of realism and efficiency in multi-agent driving simulation by adopting instance-centric observations, where each entity has its own local frame, enabling viewpoint-invariant encoding and token reuse for static map elements. A query-centric symmetric context encoder with relative pose encodings models inter-agent and inter-element interactions, while Adversarial Inverse Reinforcement Learning provides a learnable reward signal, stabilized by an adaptive reward transformation. The approach demonstrates strong robustness and scalability, outperforming agent-centric baselines on two driving datasets and achieving faster training and inference through reusable map tokens and restricted interaction scopes. The results also show improved cross-dataset generalization, highlighting the method’s potential for scalable, realistic simulation in varied traffic scenarios.

Abstract

Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.

Paper Structure

This paper contains 10 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of different scene representations. Our instance-centric representation encodes each instance in its own coordinate frame, enabling shared feature extraction.
  • Figure 2: Single simulation step: The behavior model maps observations to actions, which are then executed via a kinematic bicycle model.
  • Figure 3: Illustration of an example situation using instance-centric observations. Simulated and corresponding ground-truth vehicles are depicted as solid and outlined rectangles, respectively.
  • Figure 4: Illustration of the proposed instance-centric behavior model mapping observations to actions. Instance encoders convert observations into latent tokens. These tokens are then augmented with positional encodings relative to the target agent before being passed through multiple layers of our refinement module. Lastly, each refined actor token is decoded into its corresponding actions. Static map tokens can be reused across simulation steps.
  • Figure 5: Regressed inference latency of a single policy-network forward pass w. r. t. the number of agents. a) initial simulation step. b) subsequent simulation steps. Measured on an A100 GPU. Dots denote individual scenes.
  • ...and 2 more figures