Learning How to Dynamically Route Autonomous Vehicles on Shared Roads
Daniel A. Lazar, Erdem Bıyık, Dorsa Sadigh, Ramtin Pedarsani
TL;DR
The paper tackles congestion in mixed-autonomy traffic by learning how to route autonomous vehicles to indirectly steer human drivers toward more efficient equilibria. It combines a CTM-based road model with autonomy-dependent capacity, Hedge dynamics for human routing, and a PPO-powered autonomous-routing policy, analyzed through both equilibrium theory and extensive simulations. Theoretical results characterize path and network equilibria in parallel-network settings and show a polynomial-time method to compute the best equilibrium under restricted assumptions. Empirically, the RL policy stabilizes queues, approaches the best equilibria under disturbances, and outperforms MPC and greedy baselines across network scales, suggesting practical potential for congestion mitigation via dynamic autonomous routing.
Abstract
Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars. We study a dynamic routing game, in which the route choices of autonomous cars can be controlled and the human drivers react selfishly and dynamically. As the problem is prohibitively large, we use deep reinforcement learning to learn a policy for controlling the autonomous vehicles. This policy indirectly influences human drivers to route themselves in such a way that minimizes congestion on the network. To gauge the effectiveness of our learned policies, we establish theoretical results characterizing equilibria and empirically compare the learned policy results with best possible equilibria. We prove properties of equilibria on parallel roads and provide a polynomial-time optimization for computing the most efficient equilibrium. Moreover, we show that in the absence of these policies, high demand and network perturbations would result in large congestion, whereas using the policy greatly decreases the travel times by minimizing the congestion. To the best of our knowledge, this is the first work that employs deep reinforcement learning to reduce congestion by indirectly influencing humans' routing decisions in mixed-autonomy traffic.
