Table of Contents
Fetching ...

Robo-taxi Fleet Coordination at Scale via Reinforcement Learning

Luigi Tresca, Carolin Schmidt, James Harrison, Filipe Rodrigues, Gioele Zardini, Daniele Gammelli, Marco Pavone

TL;DR

This work tackles large-scale AMoD fleet coordination by marrying optimization, graph representation learning, and reinforcement learning into a Graph RL framework. It introduces a hierarchical three-step policy: (i) convex dispatching for passenger matching, (ii) a learned per-node vehicle distribution guiding future states, and (iii) a minimum-cost flow to realize rebalancing, all facilitated by Graph Neural Networks. Across macroscopic and mesoscopic simulations, the approach achieves near-optimal profits close to MPC-Oracle with substantially reduced rebalancing costs and demonstrates strong transferability across cities, granularities, and simulator fidelities, aided by meta-RL and offline learning options. The open-source benchmarks, datasets, and simulators enable reproducible evaluation and standardized comparisons, advancing practical deployment of AMoD systems.

Abstract

Fleets of robo-taxis offering on-demand transportation services, commonly known as Autonomous Mobility-on-Demand (AMoD) systems, hold significant promise for societal benefits, such as reducing pollution, energy consumption, and urban congestion. However, orchestrating these systems at scale remains a critical challenge, with existing coordination algorithms often failing to exploit the systems' full potential. This work introduces a novel decision-making framework that unites mathematical modeling with data-driven techniques. In particular, we present the AMoD coordination problem through the lens of reinforcement learning and propose a graph network-based framework that exploits the main strengths of graph representation learning, reinforcement learning, and classical operations research tools. Extensive evaluations across diverse simulation fidelities and scenarios demonstrate the flexibility of our approach, achieving superior system performance, computational efficiency, and generalizability compared to prior methods. Finally, motivated by the need to democratize research efforts in this area, we release publicly available benchmarks, datasets, and simulators for network-level coordination alongside an open-source codebase designed to provide accessible simulation platforms and establish a standardized validation process for comparing methodologies. Code available at: https://github.com/StanfordASL/RL4AMOD

Robo-taxi Fleet Coordination at Scale via Reinforcement Learning

TL;DR

This work tackles large-scale AMoD fleet coordination by marrying optimization, graph representation learning, and reinforcement learning into a Graph RL framework. It introduces a hierarchical three-step policy: (i) convex dispatching for passenger matching, (ii) a learned per-node vehicle distribution guiding future states, and (iii) a minimum-cost flow to realize rebalancing, all facilitated by Graph Neural Networks. Across macroscopic and mesoscopic simulations, the approach achieves near-optimal profits close to MPC-Oracle with substantially reduced rebalancing costs and demonstrates strong transferability across cities, granularities, and simulator fidelities, aided by meta-RL and offline learning options. The open-source benchmarks, datasets, and simulators enable reproducible evaluation and standardized comparisons, advancing practical deployment of AMoD systems.

Abstract

Fleets of robo-taxis offering on-demand transportation services, commonly known as Autonomous Mobility-on-Demand (AMoD) systems, hold significant promise for societal benefits, such as reducing pollution, energy consumption, and urban congestion. However, orchestrating these systems at scale remains a critical challenge, with existing coordination algorithms often failing to exploit the systems' full potential. This work introduces a novel decision-making framework that unites mathematical modeling with data-driven techniques. In particular, we present the AMoD coordination problem through the lens of reinforcement learning and propose a graph network-based framework that exploits the main strengths of graph representation learning, reinforcement learning, and classical operations research tools. Extensive evaluations across diverse simulation fidelities and scenarios demonstrate the flexibility of our approach, achieving superior system performance, computational efficiency, and generalizability compared to prior methods. Finally, motivated by the need to democratize research efforts in this area, we release publicly available benchmarks, datasets, and simulators for network-level coordination alongside an open-source codebase designed to provide accessible simulation platforms and establish a standardized validation process for comparing methodologies. Code available at: https://github.com/StanfordASL/RL4AMOD

Paper Structure

This paper contains 20 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of the proposed hierarchical decomposition for AMoD fleet coordination. Given the current distribution of idle vehicles and customer transportation requests, the decomposition entails: (1) assigning idle vehicles to trip requests (i.e., $x_{ij}^t$) by solving a convex dispatching problem; (2) determining a desired future allocation of vehicles across regions (i.e., $\hat{\mathbf{s}}^t)$ via RL; and (3) converting $\hat{\mathbf{s}}^t$ into actionable rebalancing trips (i.e., $y_{ij}^t$) while minimizing the overall rebalancing cost.
  • Figure 2: (Left) Comparison of computation times between Graph-RL and MPC. (Right) Fine-tuning performance of pre-trained policy (FT) vs. training from scratch (Online-RL).
  • Figure 3: Average waiting time per passengers distribution across the city, among the policies considered in the study.
  • Figure 4: Operator and network performance across reward structures. (Top) Profit, revenue, and rebalancing cost. (Bottom) Average waiting time and fleet utilization factor.
  • Figure 5: Comparison of operator KPIs between a fully retrained Graph-RL policy and its zero-shot performance across different network granularity, measured as deviations from Oracle.
  • ...and 1 more figures