Table of Contents
Fetching ...

Cooperative Advisory Residual Policies for Congestion Mitigation

Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell

TL;DR

This work tackles congestion mitigation by coupling a learning-based cooperative advisory system with human drivers via residual policies. By redefining the reward to promote network-wide speed and safer headways, and by explicitly modeling driver behavior through a driver policy and unsupervised trait inference, the approach localizes effective advice to a single human-driven ego vehicle. The key contributions include a novel residual policy framework (RP) and its personalized variant (PeRP) backed by a driver-trait inference module, validated through simulations and a driving-simulator user study showing meaningful improvements in congestion metrics and human compatibility. The results suggest practical potential for onboard advisory systems that adapt to diverse driver behaviors, reducing congestion and emissions while easing real-world deployment hurdles.

Abstract

Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver behavior. To this end, we develop a class of learned residual policies that can be used in cooperative advisory systems and only require the use of a single vehicle with a human driver. Our policies advise drivers to behave in ways that mitigate traffic congestion while accounting for diverse driver behaviors, particularly drivers' reactions to instructions, to provide an improved user experience. To realize such policies, we introduce an improved reward function that explicitly addresses congestion mitigation and driver attitudes to advice. We show that our residual policies can be personalized by conditioning them on an inferred driver trait that is learned in an unsupervised manner with a variational autoencoder. Our policies are trained in simulation with our novel instruction adherence driver model, and evaluated in simulation and through a user study (N=16) to capture the sentiments of human drivers. Our results show that our approaches successfully mitigate congestion while adapting to different driver behaviors, with up to 20% and 40% improvement as measured by a combination metric of speed and deviations in speed across time over baselines in our simulation tests and user study, respectively. Our user study further shows that our policies are human-compatible and personalize to drivers.

Cooperative Advisory Residual Policies for Congestion Mitigation

TL;DR

This work tackles congestion mitigation by coupling a learning-based cooperative advisory system with human drivers via residual policies. By redefining the reward to promote network-wide speed and safer headways, and by explicitly modeling driver behavior through a driver policy and unsupervised trait inference, the approach localizes effective advice to a single human-driven ego vehicle. The key contributions include a novel residual policy framework (RP) and its personalized variant (PeRP) backed by a driver-trait inference module, validated through simulations and a driving-simulator user study showing meaningful improvements in congestion metrics and human compatibility. The results suggest practical potential for onboard advisory systems that adapt to diverse driver behaviors, reducing congestion and emissions while easing real-world deployment hurdles.

Abstract

Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver behavior. To this end, we develop a class of learned residual policies that can be used in cooperative advisory systems and only require the use of a single vehicle with a human driver. Our policies advise drivers to behave in ways that mitigate traffic congestion while accounting for diverse driver behaviors, particularly drivers' reactions to instructions, to provide an improved user experience. To realize such policies, we introduce an improved reward function that explicitly addresses congestion mitigation and driver attitudes to advice. We show that our residual policies can be personalized by conditioning them on an inferred driver trait that is learned in an unsupervised manner with a variational autoencoder. Our policies are trained in simulation with our novel instruction adherence driver model, and evaluated in simulation and through a user study (N=16) to capture the sentiments of human drivers. Our results show that our approaches successfully mitigate congestion while adapting to different driver behaviors, with up to 20% and 40% improvement as measured by a combination metric of speed and deviations in speed across time over baselines in our simulation tests and user study, respectively. Our user study further shows that our policies are human-compatible and personalize to drivers.
Paper Structure (38 sections, 9 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 38 sections, 9 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: An illustration of how Advisory policies can help mitigate congestion (Note: the lines under the cars represent their most recent speed profile and only the car with the driver has the advisory system.): (a) Traffic congestion is present. (b) The Advisory Policy provides a suggestion to the driver of the vehicle with the blue square over it. (c) The driver follows the recommended advice. (d) The driver following the advice mitigates congestion.
  • Figure 2: An overview of our method. Our residual policies offset the action output by the base policy to produce more actionable advice to drivers. Portions in blue indicate the additional flow of data for our Personalized Residual Policy, PeRP, that also includes a latent vector depicting the driver's trait. Portions in red indicate the flow of data during evaluation with a human driver.
  • Figure 3: The simulation environment in CARLA. The red car in the center of the image is the ego vehicle. This image is inspired by Figure 1 in stern2018dissipation.
  • Figure 4: The driving simulator rig.
  • Figure 5: The speedometer showing the advised speed range (green area with red line) and the current speed. The current speed is displayed in green when the driver is within the advised range, and in white otherwise.
  • ...and 9 more figures