Table of Contents
Fetching ...

DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning

Ke Guo, Haochen Liu, Xiaojun Wu, Chen Lv

TL;DR

DecompGAIL addresses the instability of multi-agent Generative Adversarial Imitation Learning in traffic settings by decomposing realism into ego–map (scene) and ego–neighbor (interaction) components, thereby suppressing weakly relevant neighbor–neighbor and neighbor–map signals. It augments this with a distance-weighted social reward within a SMART-based Transformer backbone to encourage global realism. The approach combines BC pretraining with a decomposed discriminator and social PPO fine-tuning, achieving state-of-the-art realism on the Waymo WOMD Sim Agents 2025 benchmark and exhibiting improved training stability over standard PS-GAIL. This has practical impact for safer, more reliable traffic simulation used in autonomous driving evaluation and urban planning.

Abstract

Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.

DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning

TL;DR

DecompGAIL addresses the instability of multi-agent Generative Adversarial Imitation Learning in traffic settings by decomposing realism into ego–map (scene) and ego–neighbor (interaction) components, thereby suppressing weakly relevant neighbor–neighbor and neighbor–map signals. It augments this with a distance-weighted social reward within a SMART-based Transformer backbone to encourage global realism. The approach combines BC pretraining with a decomposed discriminator and social PPO fine-tuning, achieving state-of-the-art realism on the Waymo WOMD Sim Agents 2025 benchmark and exhibiting improved training stability over standard PS-GAIL. This has practical impact for safer, more reliable traffic simulation used in autonomous driving evaluation and urban planning.

Abstract

Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.

Paper Structure

This paper contains 33 sections, 15 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison with standard decentralized GAIL. (a) A standard decentralized discriminator evaluating local observations can yield spuriously low rewards for ego-realistic behavior due to unrealistic neighbor–neighbor interactions. (b) DecompGAIL separately assesses ego–map and ego–neighbor realism, combining them to obtain a high reward for expert-like ego behavior even when neighbors misbehave.
  • Figure 2: Overview of the DecompGAIL framework with three components: a Map Encoder (gray) extracting map features; a Policy Network (red) predicting motion-token distributions; and a Decomposed Discriminator (green) separately assessing scene (ego–map) and interaction (ego–neighbor) realism for expert and policy trajectories. A weighted combination forms each agent’s reward, which is then augmented with neighborhood rewards to build the social reward used by PPO training.
  • Figure 3: Training stability. In the left plot, solid lines denote the mean output realism scores, while shaded areas indicate the standard deviation. The realism meta-metric is evaluated on the 2% validation split. The DecompGAIL maintains lower variance and better simulation realism performance than PS-GAIL.
  • Figure 4: Qualitative results on WOSAC. Each row shows a rollout of our model in a different scene. Transparent boxes denote ground-truth agents; solid boxes are agents generated by our model. Agent colors indicate the per-agent reward from our decomposed discriminator. The red rectangle in the first row highlights a low-reward near-collision; the red rectangle in the second row highlights a low-reward off-road tendency.