Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

Eugene Vinitsky; Nathan Lichtlé; Xiaomeng Yang; Brandon Amos; Jakob Foerster

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster

TL;DR

Nocturne introduces a fast, vector-based, 2D driving simulator to study multi-agent coordination under partial observability, avoiding image rendering in favor of real-world trajectory-derived observations. It constructs diverse real-world scenes from the Waymo Motion dataset, enforcing fixed sensing and action constraints to examine how decentralized agents learn to reach goal states without collisions. Through RL (APPO) and imitation (BC) baselines, the paper demonstrates that agents struggle with high-interaction scenarios and human-like trajectory mimicry remains challenging, even with extensive data. The work provides a scalable benchmark to push toward safer, more human-like coordination in real-world driving, with reproducibility, ethical considerations, and avenues for future extensions such as predictive goal inference and generative scene augmentation.

Abstract

We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability. The focus of Nocturne is to enable research into inference and theory of mind in real-world multi-agent settings without the computational overhead of computer vision and feature extraction from images. Agents in this simulator only observe an obstructed view of the scene, mimicking human visual sensing constraints. Unlike existing benchmarks that are bottlenecked by rendering human-like observations directly using a camera input, Nocturne uses efficient intersection methods to compute a vectorized set of visible features in a C++ back-end, allowing the simulator to run at over 2000 steps-per-second. Using open-source trajectory and map data, we construct a simulator to load and replay arbitrary trajectories and scenes from real-world driving data. Using this environment, we benchmark reinforcement-learning and imitation-learning agents and demonstrate that the agents are quite far from human-level coordination ability and deviate significantly from the expert trajectories.

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 7 figures, 3 tables)

This paper contains 27 sections, 3 equations, 7 figures, 3 tables.

Introduction
Related Work
Benchmark construction
Defining a Nocturne Scene
Partial Observability Model and Collision Handling
Construction of the Partially Observable Stochastic Game
Rules of the Benchmark
Unusual features of the Benchmark
Experiments Setup
Results and Analysis
Success rate of baselines
Human-agent trajectory similarity
Policy Failure Modes
Conclusion
Reproducibility and Ethical Statement
...and 12 more sections

Figures (7)

Figure 1: A visual depiction of the obstruction model used to represent the objects that are visible to the agents. (Left) Obstructed view with the viewing yellow agent in the center of the cone. (Right) Original scene centered on the cone agent with an unobstructed view.
Figure 2: Four scenes demonstrating the diversity of the navigable scenes in Nocturne. Colored circles represent the goal position of the corresponding colored agent. Dots represent the trajectory of the agent, with opacity increasing as time goes on. Videos of experts negotiating these scenes can be found at https://www.nathanlct.com/research/nocturne.
Figure 3: (Left) Success at getting to the specified goal on the training data as a function of number of environment steps. "Training Files: X" means the agent was trained on X fixed scenes sampled from the training dataset. (Middle) Percent of agents that achieved their goals. (Right) Percent of agents that collided.
Figure 4: (Left) Average displacement error (mean l2-distance between an agent and an expert at each time-step). (Right) Final displacement error (l2-distance between an agent and an expert at the final time-step that an expert has a valid state).
Figure 5: Goal rate (left) and collision rate (right) of vehicles as a function of the number of times that their corresponding expert trajectory intersected with another expert trajectory (intersections). As more than 3 interactions are rare, creating noisy statistics, scenes with more than 3 interactions are placed into the 3 interaction bin.
...and 2 more figures

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

TL;DR

Abstract

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

Authors

TL;DR

Abstract

Table of Contents

Figures (7)