Atomic Proximal Policy Optimization for Electric Robo-Taxi Dispatch and Charger Allocation
Jim Dai, Manxi Wu, Zhanhao Zhang
TL;DR
The paper tackles joint dispatch, repositioning, and charging of an electric robo-taxi fleet in a stochastic urban setting using an infinite-horizon, long-run average-reward MDP. It introduces Atomic-PPO, which leverages atomic-action decomposition to reduce the action space to a constant size and uses state aggregation (battery levels and trip orders) to keep the neural networks tractable, all within a Proximal Policy Optimization framework. A fluid-approximation LP provides a theoretical upper bound, enabling rigorous benchmarking of policy quality; experiments on NYC data show Atomic-PPO achieving about 91% of this bound and substantially outperforming baselines in trip fulfillment and fleet utilization. The results yield practical insights on charger deployment—ridership-based placement is more efficient than uniform—and highlight the critical role of fast charging over simply expanding vehicle range for city-scale EV fleets.
Abstract
Pioneering companies such as Waymo have deployed robo-taxi services in several U.S. cities. These robo-taxis are electric vehicles, and their operations require the joint optimization of ride matching, vehicle repositioning, and charging scheduling in a stochastic environment. We model the operations of the ride-hailing system with robo-taxis as a discrete-time, average-reward Markov Decision Process with an infinite horizon. As the fleet size grows, dispatching becomes challenging, as both the system state space and the fleet dispatching action space grow exponentially with the number of vehicles. To address this, we introduce a scalable deep reinforcement learning algorithm, called Atomic Proximal Policy Optimization (Atomic-PPO), that reduces the action space using atomic action decomposition. We evaluate our algorithm using real-world NYC for-hire vehicle trip records and measure its performance by the long-run average reward achieved by the dispatching policy, relative to a fluid-based upper bound. Our experiments demonstrate the superior performance of Atomic-PPO compared to benchmark methods. Furthermore, we conduct extensive numerical experiments to analyze the efficient allocation of charging facilities and assess the impact of vehicle range and charger speed on system performance.
