Table of Contents
Fetching ...

HONEST-CAV: Hierarchical Optimization of Network Signals and Trajectories for Connected and Automated Vehicles with Multi-Agent Reinforcement Learning

Ziyan Zhang, Changxin Wan, Peng Hao, Kanok Boriboonsomsin, Matthew J. Barth, Yongkang Liu, Seyhan Ucar, Guoyuan Wu

TL;DR

HONEST-CAV addresses the challenge of coordinating network-wide traffic signal control with vehicle-level eco-driving in mixed HV/CAV environments. It introduces a hierarchical framework that combines a CTDE MASAC-based MARL for cycle-based TSC with SPaT prediction and an imitation-learning-based MLTPA for real-time Eco-Approach and Departure, enabling coordinated and energy-efficient operation. Key contributions include a scalable, asynchronous training scheme with VDN-based global reward, a robust SPaT predictor blending policy and historical data, and an IL-based trajectory planner that reduces computation while maintaining near-optimal energy performance; results show significant improvements in average speed, energy consumption, and idling time, especially as CAV penetration increases. The work demonstrates strong potential for real-time deployment in large urban networks and indicates further gains with electrification and zone-level extensions.

Abstract

This study presents a hierarchical, network-level traffic flow control framework for mixed traffic consisting of Human-driven Vehicles (HVs), Connected and Automated Vehicles (CAVs). The framework jointly optimizes vehicle-level eco-driving behaviors and intersection-level traffic signal control to enhance overall network efficiency and decrease energy consumption. A decentralized Multi-Agent Reinforcement Learning (MARL) approach by Value Decomposition Network (VDN) manages cycle-based traffic signal control (TSC) at intersections, while an innovative Signal Phase and Timing (SPaT) prediction method integrates a Machine Learning-based Trajectory Planning Algorithm (MLTPA) to guide CAVs in executing Eco-Approach and Departure (EAD) maneuvers. The framework is evaluated across varying CAV proportions and powertrain types to assess its effects on mobility and energy performance. Experimental results conducted in a 4*4 real-world network demonstrate that the MARL-based TSC method outperforms the baseline model (i.e., Webster method) in speed, fuel consumption, and idling time. In addition, with MLTPA, HONEST-CAV benefits the traffic system further in energy consumption and idling time. With a 60% CAV proportion, vehicle average speed, fuel consumption, and idling time can be improved/saved by 7.67%, 10.23%, and 45.83% compared with the baseline. Furthermore, discussions on CAV proportions and powertrain types are conducted to quantify the performance of the proposed method with the impact of automation and electrification.

HONEST-CAV: Hierarchical Optimization of Network Signals and Trajectories for Connected and Automated Vehicles with Multi-Agent Reinforcement Learning

TL;DR

HONEST-CAV addresses the challenge of coordinating network-wide traffic signal control with vehicle-level eco-driving in mixed HV/CAV environments. It introduces a hierarchical framework that combines a CTDE MASAC-based MARL for cycle-based TSC with SPaT prediction and an imitation-learning-based MLTPA for real-time Eco-Approach and Departure, enabling coordinated and energy-efficient operation. Key contributions include a scalable, asynchronous training scheme with VDN-based global reward, a robust SPaT predictor blending policy and historical data, and an IL-based trajectory planner that reduces computation while maintaining near-optimal energy performance; results show significant improvements in average speed, energy consumption, and idling time, especially as CAV penetration increases. The work demonstrates strong potential for real-time deployment in large urban networks and indicates further gains with electrification and zone-level extensions.

Abstract

This study presents a hierarchical, network-level traffic flow control framework for mixed traffic consisting of Human-driven Vehicles (HVs), Connected and Automated Vehicles (CAVs). The framework jointly optimizes vehicle-level eco-driving behaviors and intersection-level traffic signal control to enhance overall network efficiency and decrease energy consumption. A decentralized Multi-Agent Reinforcement Learning (MARL) approach by Value Decomposition Network (VDN) manages cycle-based traffic signal control (TSC) at intersections, while an innovative Signal Phase and Timing (SPaT) prediction method integrates a Machine Learning-based Trajectory Planning Algorithm (MLTPA) to guide CAVs in executing Eco-Approach and Departure (EAD) maneuvers. The framework is evaluated across varying CAV proportions and powertrain types to assess its effects on mobility and energy performance. Experimental results conducted in a 4*4 real-world network demonstrate that the MARL-based TSC method outperforms the baseline model (i.e., Webster method) in speed, fuel consumption, and idling time. In addition, with MLTPA, HONEST-CAV benefits the traffic system further in energy consumption and idling time. With a 60% CAV proportion, vehicle average speed, fuel consumption, and idling time can be improved/saved by 7.67%, 10.23%, and 45.83% compared with the baseline. Furthermore, discussions on CAV proportions and powertrain types are conducted to quantify the performance of the proposed method with the impact of automation and electrification.
Paper Structure (15 sections, 9 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The Schematic Diagram of HONEST-CAV
  • Figure 2: The SPaT Prediction Algorithm
  • Figure 3: The Environment of Traffic Network
  • Figure 4: The Reward Training Curves
  • Figure 5: The Reward Training Curves with Random Proportions
  • ...and 1 more figures