Table of Contents
Fetching ...

Pseudo-Simulation for Autonomous Driving

Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta

TL;DR

The paper tackles the challenge of evaluating autonomous driving systems in a scalable yet robust manner. It introduces pseudo-simulation, a two-stage evaluation that augments real observations with pre-rendered synthetic views generated via 3D Gaussian Splatting and weighted by proximity to the initial trajectory endpoint. Using the Extended Predictive Driver Model Score (EPDMS) and a Gaussian-weighted aggregation, the method achieves strong alignment with closed-loop simulations ($R^2=0.8$) and reveals edge cases overlooked by open-loop benchmarks. A public NAVSIM v2 leaderboard and accompanying code are released to standardize comparisons and accelerate AV development.

Abstract

Existing evaluation paradigms for Autonomous Vehicles (AVs) face critical limitations. Real-world evaluation is often challenging due to safety concerns and a lack of reproducibility, whereas closed-loop simulation can face insufficient realism or high computational costs. Open-loop evaluation, while being efficient and data-driven, relies on metrics that generally overlook compounding errors. In this paper, we propose pseudo-simulation, a novel paradigm that addresses these limitations. Pseudo-simulation operates on real datasets, similar to open-loop evaluation, but augments them with synthetic observations generated prior to evaluation using 3D Gaussian Splatting. Our key idea is to approximate potential future states the AV might encounter by generating a diverse set of observations that vary in position, heading, and speed. Our method then assigns a higher importance to synthetic observations that best match the AV's likely behavior using a novel proximity-based weighting scheme. This enables evaluating error recovery and the mitigation of causal confusion, as in closed-loop benchmarks, without requiring sequential interactive simulation. We show that pseudo-simulation is better correlated with closed-loop simulations ($R^2=0.8$) than the best existing open-loop approach ($R^2=0.7$). We also establish a public leaderboard for the community to benchmark new methodologies with pseudo-simulation. Our code is available at https://github.com/autonomousvision/navsim.

Pseudo-Simulation for Autonomous Driving

TL;DR

The paper tackles the challenge of evaluating autonomous driving systems in a scalable yet robust manner. It introduces pseudo-simulation, a two-stage evaluation that augments real observations with pre-rendered synthetic views generated via 3D Gaussian Splatting and weighted by proximity to the initial trajectory endpoint. Using the Extended Predictive Driver Model Score (EPDMS) and a Gaussian-weighted aggregation, the method achieves strong alignment with closed-loop simulations () and reveals edge cases overlooked by open-loop benchmarks. A public NAVSIM v2 leaderboard and accompanying code are released to standardize comparisons and accelerate AV development.

Abstract

Existing evaluation paradigms for Autonomous Vehicles (AVs) face critical limitations. Real-world evaluation is often challenging due to safety concerns and a lack of reproducibility, whereas closed-loop simulation can face insufficient realism or high computational costs. Open-loop evaluation, while being efficient and data-driven, relies on metrics that generally overlook compounding errors. In this paper, we propose pseudo-simulation, a novel paradigm that addresses these limitations. Pseudo-simulation operates on real datasets, similar to open-loop evaluation, but augments them with synthetic observations generated prior to evaluation using 3D Gaussian Splatting. Our key idea is to approximate potential future states the AV might encounter by generating a diverse set of observations that vary in position, heading, and speed. Our method then assigns a higher importance to synthetic observations that best match the AV's likely behavior using a novel proximity-based weighting scheme. This enables evaluating error recovery and the mitigation of causal confusion, as in closed-loop benchmarks, without requiring sequential interactive simulation. We show that pseudo-simulation is better correlated with closed-loop simulations () than the best existing open-loop approach (). We also establish a public leaderboard for the community to benchmark new methodologies with pseudo-simulation. Our code is available at https://github.com/autonomousvision/navsim.

Paper Structure

This paper contains 10 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Pseudo-simulation.(Top) From an initial real-world observation (a), we generate synthetic observations (b, c, d) via a variant of 3D Gaussian Splatting specialized for driving scenes Li2025ARXIV. Crucially, these synthetic observations are pre-generated prior to evaluation, unlike traditional interactive simulation where observations are generated online. (Bottom) Pseudo-simulation involves two stages. In Stage 1, we evaluate the AV's trajectory output for (a). Stage 2 involves evaluation on trajectories output for (b, c, d). Stage 2 scores are weighted ($\hat{w}^{(i)}$) based on the proximity of the Stage 2 synthetic observation start point to the Stage 1 planned endpoint. The aggregated score assesses robustness to small variations near the intended path, prioritizing the most likely futures.
  • Figure 2: Example scenes. We show the poses and front-view camera images for the initial real-world observation ( ) and pre-generated synthetic observations ( ) in four scenes.
  • Figure 3: Correlations. (a) Correlation between the default pseudo-simulation metric (EPDMS) and the closed-loop score (CLS) for a set of 37 rule-based and 46 learned planners. We further compare (b) single ($1\mathsf{x}$) vs. two stage ($2\mathsf{x}$) evaluation, (c) Gaussian weight variances, (d) Stage 1 and 2 aggregation methods, and (e) synthetic observation densities. Defaults in bold-underline.