Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world
Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster
TL;DR
Nocturne introduces a fast, vector-based, 2D driving simulator to study multi-agent coordination under partial observability, avoiding image rendering in favor of real-world trajectory-derived observations. It constructs diverse real-world scenes from the Waymo Motion dataset, enforcing fixed sensing and action constraints to examine how decentralized agents learn to reach goal states without collisions. Through RL (APPO) and imitation (BC) baselines, the paper demonstrates that agents struggle with high-interaction scenarios and human-like trajectory mimicry remains challenging, even with extensive data. The work provides a scalable benchmark to push toward safer, more human-like coordination in real-world driving, with reproducibility, ethical considerations, and avenues for future extensions such as predictive goal inference and generative scene augmentation.
Abstract
We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability. The focus of Nocturne is to enable research into inference and theory of mind in real-world multi-agent settings without the computational overhead of computer vision and feature extraction from images. Agents in this simulator only observe an obstructed view of the scene, mimicking human visual sensing constraints. Unlike existing benchmarks that are bottlenecked by rendering human-like observations directly using a camera input, Nocturne uses efficient intersection methods to compute a vectorized set of visible features in a C++ back-end, allowing the simulator to run at over 2000 steps-per-second. Using open-source trajectory and map data, we construct a simulator to load and replay arbitrary trajectories and scenes from real-world driving data. Using this environment, we benchmark reinforcement-learning and imitation-learning agents and demonstrate that the agents are quite far from human-level coordination ability and deviate significantly from the expert trajectories.
