Phase Re-service in Reinforcement Learning Traffic Signal Control
Zhiyao Zhang, George Gunter, Marcos Quinones-Grueiro, Yuhang Zhang, William Barbour, Gautam Biswas, Daniel Work
TL;DR
The paper addresses dynamic traffic patterns in adaptive signal control, focusing on high left-turn queues. It introduces a method that couples phase re-service with reinforcement learning, where the RL agent selects the next phase duration and a shock-wave-based estimator decides whether to insert a temporary re-service. Formulated as a semi-Markov decision process and solved with proximal policy optimization, the approach demonstrates substantial reductions in vehicle delays and stops across two intersection types and ten demand profiles. This work enhances adaptive traffic signal control flexibility and has practical implications for mitigating congestion at intersections with heavy left-turn demand.
Abstract
This article proposes a novel approach to traffic signal control that combines phase re-service with reinforcement learning (RL). The RL agent directly determines the duration of the next phase in a pre-defined sequence. Before the RL agent's decision is executed, we use the shock wave theory to estimate queue expansion at the designated movement allowed for re-service and decide if phase re-service is necessary. If necessary, a temporary phase re-service is inserted before the next regular phase. We formulate the RL problem as a semi-Markov decision process (SMDP) and solve it with proximal policy optimization (PPO). We conducted a series of experiments that showed significant improvements thanks to the introduction of phase re-service. Vehicle delays are reduced by up to 29.95% of the average and up to 59.21% of the standard deviation. The number of stops is reduced by 26.05% on average with 45.77% less standard deviation.
