URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles
Ahmet Onur Akman, Anastasia Psarou, Michał Hoffmann, Łukasz Gorczyca, Łukasz Kowalski, Paweł Gora, Grzegorz Jamróz, Rafał Kucharski
TL;DR
URB addresses the lack of standardized benchmarks for MARL in large-scale urban routing of mixed autonomous and human-driven traffic. It assembles 29 real-world networks, realistic demand patterns, baselines, and a modular MARL toolkit into a single benchmarking framework, and launches the first URB leaderboard. The study finds that state-of-the-art MARL methods often underperform humans in city-scale routing tasks, with high training costs and scalability challenges, underscoring the need for methodological advances. By enabling reproducible, diverse, and realistic experiments, URB aims to catalyze progress toward safe, efficient, and socially aware CAV routing in urban environments.
Abstract
Connected Autonomous Vehicles (CAVs) promise to reduce congestion in future urban networks, potentially by optimizing their routing decisions. Unlike for human drivers, these decisions can be made with collective, data-driven policies, developed using machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing. To that end, we present URB: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles. URB is a comprehensive benchmarking environment that unifies evaluation across 29 real-world traffic networks paired with realistic demand patterns. URB comes with a catalog of predefined tasks, multi-agent RL (MARL) algorithm implementations, three baseline methods, domain-specific performance metrics, and a modular configuration scheme. Our results show that, despite the lengthy and costly training, state-of-the-art MARL algorithms rarely outperformed humans. The experimental results reported in this paper initiate the first leaderboard for MARL in large-scale urban routing optimization. They reveal that current approaches struggle to scale, emphasizing the urgent need for advancements in this domain.
