Table of Contents
Fetching ...

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

TL;DR

KiGRAS redefines autonomous-driving trajectory generation by learning in the action space and using a kinematic model to map actions to physically feasible trajectories, thereby removing large portions of redundant state-space representations. A discrete action space (63×63) pairs with a kinematic bicycle model and a unified scene encoder to support a transformer-based autoregressor with causal attention, enabling compact, realistic multi-agent simulations. The approach includes an inverse kinematic transformation via MPC for training labels, plus a Discriminative Policy Optimization fine-tuning pathway to customize driving habits such as safety, speed, and smoothness. Empirical results on Waymo SimAgents show state-of-the-art realism with a small parameter footprint, supported by qualitative scenarios and ablation analyses that validate the design choices.

Abstract

Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic trajectories within the redundant state space. To address this limitation, we propose the Kinematic-Driven Generative Model for Realistic Agent Simulation (KiGRAS). Instead of modeling in the state space, KiGRAS factorizes the driving scene into action probability distributions at each time step, providing a compact space to represent realistic driving patterns. By establishing physical causality from actions (cause) to trajectories (effect) through the kinematic model, KiGRAS eliminates massive redundant trajectories. All states derived from actions in the cause space are constrained to be physically feasible. Furthermore, redundant trajectories representing identical action sequences are mapped to the same representation, reflecting their underlying actions. This approach significantly reduces task complexity and ensures physical feasibility. KiGRAS achieves state-of-the-art performance in Waymo's SimAgents Challenge, ranking first on the WOMD leaderboard with significantly fewer parameters than other models. The video documentation is available at \url{https://kigras-mach.github.io/KiGRAS/}.

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

TL;DR

KiGRAS redefines autonomous-driving trajectory generation by learning in the action space and using a kinematic model to map actions to physically feasible trajectories, thereby removing large portions of redundant state-space representations. A discrete action space (63×63) pairs with a kinematic bicycle model and a unified scene encoder to support a transformer-based autoregressor with causal attention, enabling compact, realistic multi-agent simulations. The approach includes an inverse kinematic transformation via MPC for training labels, plus a Discriminative Policy Optimization fine-tuning pathway to customize driving habits such as safety, speed, and smoothness. Empirical results on Waymo SimAgents show state-of-the-art realism with a small parameter footprint, supported by qualitative scenarios and ablation analyses that validate the design choices.

Abstract

Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic trajectories within the redundant state space. To address this limitation, we propose the Kinematic-Driven Generative Model for Realistic Agent Simulation (KiGRAS). Instead of modeling in the state space, KiGRAS factorizes the driving scene into action probability distributions at each time step, providing a compact space to represent realistic driving patterns. By establishing physical causality from actions (cause) to trajectories (effect) through the kinematic model, KiGRAS eliminates massive redundant trajectories. All states derived from actions in the cause space are constrained to be physically feasible. Furthermore, redundant trajectories representing identical action sequences are mapped to the same representation, reflecting their underlying actions. This approach significantly reduces task complexity and ensures physical feasibility. KiGRAS achieves state-of-the-art performance in Waymo's SimAgents Challenge, ranking first on the WOMD leaderboard with significantly fewer parameters than other models. The video documentation is available at \url{https://kigras-mach.github.io/KiGRAS/}.
Paper Structure (18 sections, 9 equations, 5 figures, 3 tables)

This paper contains 18 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Performance comparison of different models based on Parameters (M) and Realism Meta-metric, the overall metric of Waymo's SimAgent. Each marker represents a specific model with its respective parameter size and realism meta score. For more details, see Section \ref{['sec:performance_cmp']}.
  • Figure 2: The architecture of the KiGRAS framework. First, we solve for the sequence of control actions from trajectory states using the inverse kinematic transformation module. These actions are then represented in a discrete space for processing the forward update of the inference process. To encode the traffic scene, we use a unified spatial encoder to embed the spatial information of all agents and map lines, along with two attribute encoders to describe their traffic roles. Building on this, an autoregressive transform-based decoder is designed to decode the action probability distributions for each state.
  • Figure 3: An illustration of the rolling horizon strategy in the inverse kinematic transformation module. We use the $k$ consecutive states covered by the rolling time window to simultaneously optimize the sequence of control actions for these states (green-lined blocks). The optimal action for the closest future state (blue-lined blocks) is taken as output to mitigate accumulated errors. After these steps, the rolling time window moves forward one step to iteratively solve for all actions.
  • Figure 4: Qualitative results of closed-loop simulation. We present four representative scenarios generated by KiGRAS. Trajectories of all agents in these scenarios are generated by KiGRAS. In a fully closed-loop setting, we simulate the future for 8 seconds. Agents of interest are highlighted with distinct colors (some with labels), and their one-second historical trajectories are shown to illustrate their speed changes.
  • Figure 5: Performance Comparison of Pre-trained Driver and Fast Driver Models. Both models were simulated 16 times for 8 seconds each under semi-closed-loop settings.