Table of Contents
Fetching ...

RouteRL: Multi-agent reinforcement learning framework for urban route choice with autonomous vehicles

Ahmet Onur Akman, Anastasia Psarou, Łukasz Gorczyca, Zoltán György Varga, Grzegorz Jamróz, Rafał Kucharski

TL;DR

RouteRL addresses the challenge of understanding autonomous vehicle routing in mixed urban traffic by marrying multi-agent reinforcement learning with a high-fidelity microscopic simulator. It introduces an open-source, modular framework that simulates day-to-day route choices for humans and MARL-trained AVs across OpenStreetMap networks, using behavioral models and various MARL algorithms. The work demonstrates experiments on Cologne, Ingolstadt, and Manhattan networks, showing how AV adoption, reward structures, and algorithm choice influence travel times and system performance, while emphasizing reproducibility and policy relevance. By providing a unified testbed with configurable networks, demand, and AV strategies, RouteRL enables rigorous comparisons of MARL approaches and supports research on equity, emissions, and urban mobility in the presence of AV fleets.

Abstract

RouteRL is a novel framework that integrates multi-agent reinforcement learning (MARL) with a microscopic traffic simulation, facilitating the testing and development of efficient route choice strategies for autonomous vehicles (AVs). The proposed framework simulates the daily route choices of driver agents in a city, including two types: human drivers, emulated using behavioral route choice models, and AVs, modeled as MARL agents optimizing their policies for a predefined objective. RouteRL aims to advance research in MARL, transport modeling, and human-AI interaction for transportation applications. This study presents a technical report on RouteRL, outlines its potential research contributions, and showcases its impact via illustrative examples.

RouteRL: Multi-agent reinforcement learning framework for urban route choice with autonomous vehicles

TL;DR

RouteRL addresses the challenge of understanding autonomous vehicle routing in mixed urban traffic by marrying multi-agent reinforcement learning with a high-fidelity microscopic simulator. It introduces an open-source, modular framework that simulates day-to-day route choices for humans and MARL-trained AVs across OpenStreetMap networks, using behavioral models and various MARL algorithms. The work demonstrates experiments on Cologne, Ingolstadt, and Manhattan networks, showing how AV adoption, reward structures, and algorithm choice influence travel times and system performance, while emphasizing reproducibility and policy relevance. By providing a unified testbed with configurable networks, demand, and AV strategies, RouteRL enables rigorous comparisons of MARL approaches and supports research on equity, emissions, and urban mobility in the presence of AV fleets.

Abstract

RouteRL is a novel framework that integrates multi-agent reinforcement learning (MARL) with a microscopic traffic simulation, facilitating the testing and development of efficient route choice strategies for autonomous vehicles (AVs). The proposed framework simulates the daily route choices of driver agents in a city, including two types: human drivers, emulated using behavioral route choice models, and AVs, modeled as MARL agents optimizing their policies for a predefined objective. RouteRL aims to advance research in MARL, transport modeling, and human-AI interaction for transportation applications. This study presents a technical report on RouteRL, outlines its potential research contributions, and showcases its impact via illustrative examples.

Paper Structure

This paper contains 18 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The software at a glance: RouteRL can simulate any OSM network on which a set of agents demands to reach their destinations. SUMO traffic microsimulator simulates how vehicles traverse the network. In a sequence of days (episodes), agents iteratively learn how to optimize their routing decisions (arrive faster). Humans use behavioral models, and AVs use policies trained with state-of-the-art MARL algorithms implemented via TorchRL.
  • Figure 2: A typical multi-agent training pipeline using RouteRL: The training procedure alternates between two stages: (i) Sampling (left), where traffic microsimulator (SUMO) returns rewards resulting from given route choices made by all agents, and (ii) Training (right), where the collected experiences are used to train AV policies via TorchRL’s MARL algorithm implementations. Policy parameters are iteratively updated and fed back into the sampling loop.
  • Figure 3: Networks: Small (part of Cologne), medium (part of Ingolstadt), and big (Manhattan) networks, which agents (human drivers and AVs) traverse to reach their destinations in illustrative experiments. Every day, agents choose routes from discrete options, illustrated on (d) for the Cologne network.
  • Figure 4: Results from the experiment on the Cologne network. 40 out of 100 human drivers transition into AVs with 'malicious' behavior. These results suggest that actions made by AVs in the Cologne network can be different from those made by humans (left) and AV's actions will not only make AVs arrive faster but also make the humans travel longer (right).
  • Figure 5: Mean rewards of AVs with various algorithms and strategies. In Ingolstadt, 20 out of 100 human agents transition to selfish AVs (left), and in Cologne, 40 out of 100 human agents transition to malicious AVs (right). The results show IQL and VDN yield superior returns in their respective experiments.
  • ...and 4 more figures