Table of Contents
Fetching ...

Learning How to Dynamically Route Autonomous Vehicles on Shared Roads

Daniel A. Lazar, Erdem Bıyık, Dorsa Sadigh, Ramtin Pedarsani

TL;DR

The paper tackles congestion in mixed-autonomy traffic by learning how to route autonomous vehicles to indirectly steer human drivers toward more efficient equilibria. It combines a CTM-based road model with autonomy-dependent capacity, Hedge dynamics for human routing, and a PPO-powered autonomous-routing policy, analyzed through both equilibrium theory and extensive simulations. Theoretical results characterize path and network equilibria in parallel-network settings and show a polynomial-time method to compute the best equilibrium under restricted assumptions. Empirically, the RL policy stabilizes queues, approaches the best equilibria under disturbances, and outperforms MPC and greedy baselines across network scales, suggesting practical potential for congestion mitigation via dynamic autonomous routing.

Abstract

Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars. We study a dynamic routing game, in which the route choices of autonomous cars can be controlled and the human drivers react selfishly and dynamically. As the problem is prohibitively large, we use deep reinforcement learning to learn a policy for controlling the autonomous vehicles. This policy indirectly influences human drivers to route themselves in such a way that minimizes congestion on the network. To gauge the effectiveness of our learned policies, we establish theoretical results characterizing equilibria and empirically compare the learned policy results with best possible equilibria. We prove properties of equilibria on parallel roads and provide a polynomial-time optimization for computing the most efficient equilibrium. Moreover, we show that in the absence of these policies, high demand and network perturbations would result in large congestion, whereas using the policy greatly decreases the travel times by minimizing the congestion. To the best of our knowledge, this is the first work that employs deep reinforcement learning to reduce congestion by indirectly influencing humans' routing decisions in mixed-autonomy traffic.

Learning How to Dynamically Route Autonomous Vehicles on Shared Roads

TL;DR

The paper tackles congestion in mixed-autonomy traffic by learning how to route autonomous vehicles to indirectly steer human drivers toward more efficient equilibria. It combines a CTM-based road model with autonomy-dependent capacity, Hedge dynamics for human routing, and a PPO-powered autonomous-routing policy, analyzed through both equilibrium theory and extensive simulations. Theoretical results characterize path and network equilibria in parallel-network settings and show a polynomial-time method to compute the best equilibrium under restricted assumptions. Empirically, the RL policy stabilizes queues, approaches the best equilibria under disturbances, and outperforms MPC and greedy baselines across network scales, suggesting practical potential for congestion mitigation via dynamic autonomous routing.

Abstract

Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars. We study a dynamic routing game, in which the route choices of autonomous cars can be controlled and the human drivers react selfishly and dynamically. As the problem is prohibitively large, we use deep reinforcement learning to learn a policy for controlling the autonomous vehicles. This policy indirectly influences human drivers to route themselves in such a way that minimizes congestion on the network. To gauge the effectiveness of our learned policies, we establish theoretical results characterizing equilibria and empirically compare the learned policy results with best possible equilibria. We prove properties of equilibria on parallel roads and provide a polynomial-time optimization for computing the most efficient equilibrium. Moreover, we show that in the absence of these policies, high demand and network perturbations would result in large congestion, whereas using the policy greatly decreases the travel times by minimizing the congestion. To the best of our knowledge, this is the first work that employs deep reinforcement learning to reduce congestion by indirectly influencing humans' routing decisions in mixed-autonomy traffic.

Paper Structure

This paper contains 18 sections, 8 theorems, 29 equations, 10 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption asmp:parallel a path $p$ with flow dynamics described in Section sec:flow_dynamics that is at Path Equilibrium will have the same autonomy level in all cells. Denote this autonomy level $\alpha_{p}$. If the vehicle flow demand is strictly less than the minimum cell capacity, the pat

Figures (10)

  • Figure 1: The schematic diagram of our framework. Our deep RL agent processes the state of the traffic and outputs a control policy for autonomous cars' routing.
  • Figure 2: (a) Fundamental diagram of traffic governing vehicle flow in each cell of the Cell Transmission Model. The solid line corresponds to a cell with only human-driven vehicles; the dashed line represents a cell with both vehicle types at autonomy level $\alpha_i$. Green and red respectively represent a cell in free-flow and congestion. (b) The flow from one cell to another is a function of the density $n$ and autonomy level $\alpha$ in each cell. In both figures, we suppress the notation for path $p$.
  • Figure 3: The small general class network used for experiments.
  • Figure 4: Time vs. number of cars under selfish, MPC and RL routing on the small general class network.
  • Figure 5: OW network (adapted from de2011modelling) used for experiments.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Definition 1: Path Equilibrium
  • Definition 2: Network Equilibrium
  • Definition 3: Bottleneck
  • Theorem 1
  • proof
  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 6 more