Table of Contents
Fetching ...

Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning

Aaryan Singhal, Daniele Gammelli, Justin Luke, Karthik Gopalakrishnan, Dominik Helmreich, Marco Pavone

TL;DR

This paper tackles real-time control of Electric Autonomous Mobility-on-Demand fleets by jointly optimizing matching, rebalancing, and charging. It develops a graph-RL framework that operates on a space-charge graph, using a tri-level decomposition to combine a learning-based target distribution with a tractable rebalancing LP, enabling real-time decisions at scale. The approach yields near-optimal profits (up to 89% of the theoretical optimum in certain setups) with massive speedups and demonstrates strong transfer and generalization across cities and service areas. The work offers a practical, scalable path toward deploying learning-based fleet control in large urban environments, outperforming domain-specific heuristics while maintaining real-time feasibility.

Abstract

Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.

Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning

TL;DR

This paper tackles real-time control of Electric Autonomous Mobility-on-Demand fleets by jointly optimizing matching, rebalancing, and charging. It develops a graph-RL framework that operates on a space-charge graph, using a tri-level decomposition to combine a learning-based target distribution with a tractable rebalancing LP, enabling real-time decisions at scale. The approach yields near-optimal profits (up to 89% of the theoretical optimum in certain setups) with massive speedups and demonstrates strong transfer and generalization across cities and service areas. The work offers a practical, scalable path toward deploying learning-based fleet control in large urban environments, outperforming domain-specific heuristics while maintaining real-time feasibility.

Abstract

Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.
Paper Structure (18 sections, 4 equations, 4 figures, 1 table)

This paper contains 18 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: A visual representation of the tri-level framework for a given time step $t$. Step 1 (left) involves matching ride requests to vehicles. Step 2 (center) uses the policy learned through RL to compute an ideal redistribution of vehicles over the space-charge graph $\mathcal{G}$, i.e., $\mathbf{a}_{t}$. Step 3 (right) computes the spatial rebalancing and charging flows to achieve (as best as possible) the target distribution given by $\mathbf{a}_{t}$.
  • Figure 2: Average served demand and operational cost comparison on San Francisco and New York (5, 10, 15, 20) scenarios.
  • Figure 3: Comparison of computation times between optimization (MPC-oracle, orange), graph-RL (blue), and heuristics (green).
  • Figure 4: Reward curve when training SF20 from scratch compared with zero-shot performance of SF5, SF10, and SF15 agents when deployed on SF20.