Learning to Drive in New Cities Without Human Demonstrations

Zilin Wang; Saeed Rahmani; Daphne Cornelisse; Bidipta Sarkar; Alexander David Goldie; Jakob Nicolaus Foerster; Shimon Whiteson

Learning to Drive in New Cities Without Human Demonstrations

Zilin Wang, Saeed Rahmani, Daphne Cornelisse, Bidipta Sarkar, Alexander David Goldie, Jakob Nicolaus Foerster, Shimon Whiteson

TL;DR

This paper introduces NO data Map-based self-play for Autonomous Driving (NOMAD), which enables policy adaptation in a simulator constructed based on the target-city map, demonstrating an effective and scalable alternative to data-intensive city-transfer methods.

Abstract

While autonomous vehicles have achieved reliable performance within specific operating regions, their deployment to new cities remains costly and slow. A key bottleneck is the need to collect many human demonstration trajectories when adapting driving policies to new cities that differ from those seen in training in terms of road geometry, traffic rules, and interaction patterns. In this paper, we show that self-play multi-agent reinforcement learning can adapt a driving policy to a substantially different target city using only the map and meta-information, without requiring any human demonstrations from that city. We introduce NO data Map-based self-play for Autonomous Driving (NOMAD), which enables policy adaptation in a simulator constructed based on the target-city map. Using a simple reward function, NOMAD substantially improves both task success rate and trajectory realism in target cities, demonstrating an effective and scalable alternative to data-intensive city-transfer methods. Project Page: https://nomaddrive.github.io/

Learning to Drive in New Cities Without Human Demonstrations

TL;DR

Abstract

Paper Structure (50 sections, 11 equations, 35 figures, 13 tables, 1 algorithm)

This paper contains 50 sections, 11 equations, 35 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Large-Scale Deployment of Autonomous Driving
City Transfer of Autonomous Driving
Self-Play for Autonomous Driving
Preliminaries and Problem Formulation
Multi-Agent Interaction Model
NOMAD
Overview
Policy Adaptation via Regularized Self-Play
Experimental Setup
Results
Main Results
The Role of Behavioral Priors
The Necessity of Target-City Map
...and 35 more sections

Figures (35)

Figure 1: City transfer in autonomous driving. Top: Zero-shot deployment of an imitation policy trained in a source city into a new city leads to performance degradation due to cross-city distribution shift. Bottom: NOMAD adapts the same policy to the target city using only the target-city map and easily accessible meta-information, without any human demonstrations, by performing map-based self-play multi-agent reinforcement learning in a simulator of the new city. This adaptation substantially improves the policy's success rate and realism in the new city.
Figure 2: NOMAD overview. Starting from a source-city imitation policy $\pi^0$, NOMAD adapts it to a target city $C$ using map segments $m\sim \mathcal{M}_C$ and meta-information $\mathcal{I}_C$. A scenario generator samples initial states and goals that are loaded in a data-driven multi-agent simulator, yielding a simulator of $C$. The policy $\pi_\theta$ is initialized from $\pi^0$ and optimized via KL-regularized self-play MARL. Training checkpoints produce an adapted policy set $\Pi^+(C)$, from which a deployment policy $\pi^{\text{deploy}}$ is chosen based on practitioner preferences.
Figure 3: Success--realism trade-offs under city transfer. We plot mean success rate versus mean realism meta score over 5 runs for three transfer settings: (a) Boston-to-Singapore (primary), (b) Singapore-to-Boston, and (c)Singapore-to-Pittsburgh. Each dot is a NOMAD training checkpoint, colored by the cumulative number of interaction steps; the red dashed curve denotes the empirical Pareto frontier over NOMAD checkpoints. Stars and dashed curves denote reference policies and ablations: the zero-shot transfer behavior cloning policy from the source city $\pi^0$ (BC (Source)), behavior cloning with target-city demonstrations (BC (Target)), BC pretrained, self-play with logged target-city scenarios (BC (Target) + RL (Target)), BC pretrained, self-play with logged source-city scenarios (BC (Source) + RL (Source)), and RL from scratch in the target city with generated scenarios (RL (Target)).
Figure 4: Comparison of kinematic metrics between self-play with and without behavioral priors. Self-play without behavioral priors struggles to learn kinematically realistic behavior, while NOMAD preserves substantially more realistic motion patterns despite lacking explicit kinematic rewards.
Figure 5: Pareto frontiers of success rate versus realism meta score for different KL weights. Smaller KL weights favor higher success at the cost of realism, while larger KL weights encourage more realistic behaviors but constrain success rate.
...and 30 more figures

Learning to Drive in New Cities Without Human Demonstrations

TL;DR

Abstract

Learning to Drive in New Cities Without Human Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Figures (35)