Table of Contents
Fetching ...

nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation

Mingxing Peng, Ruoyu Yao, Xusen Guo, Jun Ma

TL;DR

nuPlan-R addresses the gap in closed-loop AV benchmarks by replacing IDM with learning-based diffusion-based reactive agents and a selective-update mechanism to simulate realistic multi-agent interactions. It integrates a Nexus-inspired noise-decoupled diffusion model with a diffusion transformer backbone, and introduces SR and PR metrics to assess robustness and balance. The authors reimplement and evaluate a range of planners within nuPlan and nuPlan-R, showing learning-based planners gain advantages in complex interactive scenarios while rule-based baselines underperform in the more realistic environment. The benchmark is designed to be open-source and provides a more reliable, fair, and informative evaluation platform for autonomous driving planners.

Abstract

Recent advances in closed-loop planning benchmarks have significantly improved the evaluation of autonomous vehicles. However, existing benchmarks still rely on rule-based reactive agents such as the Intelligent Driver Model (IDM), which lack behavioral diversity and fail to capture realistic human interactions, leading to oversimplified traffic dynamics. To address these limitations, we present nuPlan-R, a new reactive closed-loop planning benchmark that integrates learning-based reactive multi-agent simulation into the nuPlan framework. Our benchmark replaces the rule-based IDM agents with noise-decoupled diffusion-based reactive agents and introduces an interaction-aware agent selection mechanism to ensure both realism and computational efficiency. Furthermore, we extend the benchmark with two additional metrics to enable a more comprehensive assessment of planning performance. Extensive experiments demonstrate that our reactive agent model produces more realistic, diverse, and human-like traffic behaviors, leading to a benchmark environment that better reflects real-world interactive driving. We further reimplement a collection of rule-based, learning-based, and hybrid planning approaches within our nuPlan-R benchmark, providing a clearer reflection of planner performance in complex interactive scenarios and better highlighting the advantages of learning-based planners in handling complex and dynamic scenarios. These results establish nuPlan-R as a new standard for fair, reactive, and realistic closed-loop planning evaluation. We will open-source the code for the new benchmark.

nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation

TL;DR

nuPlan-R addresses the gap in closed-loop AV benchmarks by replacing IDM with learning-based diffusion-based reactive agents and a selective-update mechanism to simulate realistic multi-agent interactions. It integrates a Nexus-inspired noise-decoupled diffusion model with a diffusion transformer backbone, and introduces SR and PR metrics to assess robustness and balance. The authors reimplement and evaluate a range of planners within nuPlan and nuPlan-R, showing learning-based planners gain advantages in complex interactive scenarios while rule-based baselines underperform in the more realistic environment. The benchmark is designed to be open-source and provides a more reliable, fair, and informative evaluation platform for autonomous driving planners.

Abstract

Recent advances in closed-loop planning benchmarks have significantly improved the evaluation of autonomous vehicles. However, existing benchmarks still rely on rule-based reactive agents such as the Intelligent Driver Model (IDM), which lack behavioral diversity and fail to capture realistic human interactions, leading to oversimplified traffic dynamics. To address these limitations, we present nuPlan-R, a new reactive closed-loop planning benchmark that integrates learning-based reactive multi-agent simulation into the nuPlan framework. Our benchmark replaces the rule-based IDM agents with noise-decoupled diffusion-based reactive agents and introduces an interaction-aware agent selection mechanism to ensure both realism and computational efficiency. Furthermore, we extend the benchmark with two additional metrics to enable a more comprehensive assessment of planning performance. Extensive experiments demonstrate that our reactive agent model produces more realistic, diverse, and human-like traffic behaviors, leading to a benchmark environment that better reflects real-world interactive driving. We further reimplement a collection of rule-based, learning-based, and hybrid planning approaches within our nuPlan-R benchmark, providing a clearer reflection of planner performance in complex interactive scenarios and better highlighting the advantages of learning-based planners in handling complex and dynamic scenarios. These results establish nuPlan-R as a new standard for fair, reactive, and realistic closed-loop planning evaluation. We will open-source the code for the new benchmark.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A road map of reactive closed-loop planning benchmark: nuPlan-R. The Reactive Agent Model represents our trained learning-based reactive agent, which replaces the traditional rule-based IDM agents to enable realistic and interactive traffic simulation.
  • Figure 2: Visual illustration of the results. (a) TTC Distribution: Comparison of TTC distributions between log-replay, rule-based IDM, and our learning-based reactive agent, showing that our model better aligns with real-world statistics. (b) Cluster-Trajectory: Visualization of representative trajectory clusters for (1) log-replay data, (2) IDM simulation, and (3) our reactive simulation, demonstrating that our model captures richer and more realistic multi-agent motion patterns.
  • Figure 3: Qualitative comparison of reactive closed-loop simulation behaviors between rule-based IDM agents and our learning-based agents across five representative scenarios. Our model produces more realistic and reactive interactions that better resemble real-world driving behaviors.