UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

Reza Mahjourian; Rongbing Mu; Valerii Likhosherstov; Paul Mougin; Xiukun Huang; Joao Messias; Shimon Whiteson

UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

Reza Mahjourian, Rongbing Mu, Valerii Likhosherstov, Paul Mougin, Xiukun Huang, Joao Messias, Shimon Whiteson

TL;DR

The paper addresses scalable generation of diverse, safety-critical autonomous driving scenarios by formulating the conditional problem $p(S|R,S_c)$ and introducing UniGen, a unified autoregressive model that jointly predicts new agents' initial states and future trajectories from a shared scene embedding. UniGen fuses a global shared encoder with an agent-centric road-layout transformer and employs three decoders for occupancy, attributes, and trajectories to ensure consistent, multimodal proposals. The autoregressive generation injects agents one-by-one, conditioning each new agent on the full history of the scene, which improves realism and reduces collisions. On the Waymo Open Motion Dataset, UniGen achieves state-of-the-art results on scene distribution and motion metrics, significantly lowering static and dynamic collision rates compared to prior methods and ablations highlight the importance of the agent-centric encoder and trajectory conditioning.

Abstract

This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenario embedding, we ensure that the final generated scenario is fully conditioned on all available context in the existing scene. Our unified modeling approach, combined with autoregressive agent injection, conditions the placement and motion trajectory of every new agent on all existing agents and their trajectories, leading to realistic scenarios with low collision rates. Our experimental results show that UniGen outperforms prior state of the art on the Waymo Open Motion Dataset.

UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

TL;DR

The paper addresses scalable generation of diverse, safety-critical autonomous driving scenarios by formulating the conditional problem

and introducing UniGen, a unified autoregressive model that jointly predicts new agents' initial states and future trajectories from a shared scene embedding. UniGen fuses a global shared encoder with an agent-centric road-layout transformer and employs three decoders for occupancy, attributes, and trajectories to ensure consistent, multimodal proposals. The autoregressive generation injects agents one-by-one, conditioning each new agent on the full history of the scene, which improves realism and reduces collisions. On the Waymo Open Motion Dataset, UniGen achieves state-of-the-art results on scene distribution and motion metrics, significantly lowering static and dynamic collision rates compared to prior methods and ablations highlight the importance of the agent-centric encoder and trajectory conditioning.

Abstract

Paper Structure (25 sections, 4 equations, 4 figures, 2 tables)

This paper contains 25 sections, 4 equations, 4 figures, 2 tables.

INTRODUCTION
RELATED WORK
Traffic scenario generation
Motion forecasting
Autoregressive generative models
PROBLEM FORMULATION
METHOD
Masking Ground Truth Scenarios
Input Representation
Shared Scenario Encoder
Occupancy Prediction
Agent-Centric Road Layout Encoder
Per-Agent Feature Fusion
Agent Attribute Prediction
Trajectory Prediction
...and 10 more sections

Figures (4)

Figure 1: UniGen's autoregressive process for iteratively injecting new agents into a scenario. In each iteration, the model fully instantiates the initial state (top) and future trajectory (bottom) for a new agent (highlighted in pink). All properties of the new agent are conditioned on the scene context and the entire trajectories for existing agents (shown in white).
Figure 2: The overall design of UniGen. (a) The sparse inputs to the model consist of the polylines from the road layout, the points representing traffic lights, and the points uniformly sampled from BEV bounding boxes of existing scenario agents, if any. (b) The points are encoded into a dense scenario embedding. Three separate decoders predict occupancy distribution of new agents to inject, their initial states, and their future trajectories. (c) The occupancy decoder predicts the distribution of initial locations separately for $C$ classes of agents. In each iteration, one location is sampled from the occupancy heatmap to inject a new agent. (d) The location of the new agent is linearly mapped to a location in the dense scenario embedding and a feature patch is extracted surrounding that location. (e) In addition, a agent-centric road layout transformer encoder extracts and encodes the road polylines normalized to the coordinate frame of the injection location. (f) This agent-centric road layout encoding is fused with the flattened feature patch extracted from the shared scenario embedding using a 1-layer MLP. (g) The product is fed to the attribute decoder to predict the initial agent states as a 5D multivariate mixture distribution with $M$ modes. (h) Five scalar attribute values are sampled, which together with the sampled agent location constitute the complete initial agent state. (i) The trajectory decoder receives this initial agent state in addition to the fused feature encoding from (f), and predicts a set of $K$ trajectories with associated probabilities spanning over $T$ timesteps. Each trajectory waypoint is represented by a 2D Gaussian. (j) Finally, a single trajectory is sampled from the $K$ choices. At this point, the new agent is fully instantiated. The new agent is added to the scenario inputs in component (a) and the next iteration starts. Note: At training time, $N$ equals the number of hidden ground-truth agents. At inference time, $N$ equals 1 for injecting a single agent in each iteration.
Figure 3: Masking ground-truth agents for constructing labels for training the occupancy decoder (shown), as well as the attribute and trajectory decoders (not shown). In the ground-truth occupancy grids, only cells containing agent centers are turned on.
Figure 4: Example scenarios generated by UniGen. The left column shows two sample ground-truth scenarios. For each scenario, we remove all agents and generate new agents with trajectories given just the road layout. We apply the method three separate times resulting in three different generated scenarios.

UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

TL;DR

Abstract

UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

Authors

TL;DR

Abstract

Table of Contents

Figures (4)