HONEST-CAV: Hierarchical Optimization of Network Signals and Trajectories for Connected and Automated Vehicles with Multi-Agent Reinforcement Learning
Ziyan Zhang, Changxin Wan, Peng Hao, Kanok Boriboonsomsin, Matthew J. Barth, Yongkang Liu, Seyhan Ucar, Guoyuan Wu
TL;DR
HONEST-CAV addresses the challenge of coordinating network-wide traffic signal control with vehicle-level eco-driving in mixed HV/CAV environments. It introduces a hierarchical framework that combines a CTDE MASAC-based MARL for cycle-based TSC with SPaT prediction and an imitation-learning-based MLTPA for real-time Eco-Approach and Departure, enabling coordinated and energy-efficient operation. Key contributions include a scalable, asynchronous training scheme with VDN-based global reward, a robust SPaT predictor blending policy and historical data, and an IL-based trajectory planner that reduces computation while maintaining near-optimal energy performance; results show significant improvements in average speed, energy consumption, and idling time, especially as CAV penetration increases. The work demonstrates strong potential for real-time deployment in large urban networks and indicates further gains with electrification and zone-level extensions.
Abstract
This study presents a hierarchical, network-level traffic flow control framework for mixed traffic consisting of Human-driven Vehicles (HVs), Connected and Automated Vehicles (CAVs). The framework jointly optimizes vehicle-level eco-driving behaviors and intersection-level traffic signal control to enhance overall network efficiency and decrease energy consumption. A decentralized Multi-Agent Reinforcement Learning (MARL) approach by Value Decomposition Network (VDN) manages cycle-based traffic signal control (TSC) at intersections, while an innovative Signal Phase and Timing (SPaT) prediction method integrates a Machine Learning-based Trajectory Planning Algorithm (MLTPA) to guide CAVs in executing Eco-Approach and Departure (EAD) maneuvers. The framework is evaluated across varying CAV proportions and powertrain types to assess its effects on mobility and energy performance. Experimental results conducted in a 4*4 real-world network demonstrate that the MARL-based TSC method outperforms the baseline model (i.e., Webster method) in speed, fuel consumption, and idling time. In addition, with MLTPA, HONEST-CAV benefits the traffic system further in energy consumption and idling time. With a 60% CAV proportion, vehicle average speed, fuel consumption, and idling time can be improved/saved by 7.67%, 10.23%, and 45.83% compared with the baseline. Furthermore, discussions on CAV proportions and powertrain types are conducted to quantify the performance of the proposed method with the impact of automation and electrification.
