Table of Contents
Fetching ...

Phase Re-service in Reinforcement Learning Traffic Signal Control

Zhiyao Zhang, George Gunter, Marcos Quinones-Grueiro, Yuhang Zhang, William Barbour, Gautam Biswas, Daniel Work

TL;DR

The paper addresses dynamic traffic patterns in adaptive signal control, focusing on high left-turn queues. It introduces a method that couples phase re-service with reinforcement learning, where the RL agent selects the next phase duration and a shock-wave-based estimator decides whether to insert a temporary re-service. Formulated as a semi-Markov decision process and solved with proximal policy optimization, the approach demonstrates substantial reductions in vehicle delays and stops across two intersection types and ten demand profiles. This work enhances adaptive traffic signal control flexibility and has practical implications for mitigating congestion at intersections with heavy left-turn demand.

Abstract

This article proposes a novel approach to traffic signal control that combines phase re-service with reinforcement learning (RL). The RL agent directly determines the duration of the next phase in a pre-defined sequence. Before the RL agent's decision is executed, we use the shock wave theory to estimate queue expansion at the designated movement allowed for re-service and decide if phase re-service is necessary. If necessary, a temporary phase re-service is inserted before the next regular phase. We formulate the RL problem as a semi-Markov decision process (SMDP) and solve it with proximal policy optimization (PPO). We conducted a series of experiments that showed significant improvements thanks to the introduction of phase re-service. Vehicle delays are reduced by up to 29.95% of the average and up to 59.21% of the standard deviation. The number of stops is reduced by 26.05% on average with 45.77% less standard deviation.

Phase Re-service in Reinforcement Learning Traffic Signal Control

TL;DR

The paper addresses dynamic traffic patterns in adaptive signal control, focusing on high left-turn queues. It introduces a method that couples phase re-service with reinforcement learning, where the RL agent selects the next phase duration and a shock-wave-based estimator decides whether to insert a temporary re-service. Formulated as a semi-Markov decision process and solved with proximal policy optimization, the approach demonstrates substantial reductions in vehicle delays and stops across two intersection types and ten demand profiles. This work enhances adaptive traffic signal control flexibility and has practical implications for mitigating congestion at intersections with heavy left-turn demand.

Abstract

This article proposes a novel approach to traffic signal control that combines phase re-service with reinforcement learning (RL). The RL agent directly determines the duration of the next phase in a pre-defined sequence. Before the RL agent's decision is executed, we use the shock wave theory to estimate queue expansion at the designated movement allowed for re-service and decide if phase re-service is necessary. If necessary, a temporary phase re-service is inserted before the next regular phase. We formulate the RL problem as a semi-Markov decision process (SMDP) and solve it with proximal policy optimization (PPO). We conducted a series of experiments that showed significant improvements thanks to the introduction of phase re-service. Vehicle delays are reduced by up to 29.95% of the average and up to 59.21% of the standard deviation. The number of stops is reduced by 26.05% on average with 45.77% less standard deviation.
Paper Structure (16 sections, 9 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: An example of intersection with a protected movement with phase re-service (red). In the top cycle, each phase is served once. In the bottom cycle, the high demand protected left turn movement is re-served in phase 3.
  • Figure 2: Demonstration of the shock wave at a single intersection lane over a signal cycle. Yellow transition is ignored for simplicity.
  • Figure 3: Experimental scenarios and their phase sequences are shown. Vehicle movements are shown as arrows, protected ones in green and others in red. Regular and re-service phases are boxes with solid and dotted lines.
  • Figure 4: Step-average reward curves for 5 runs. Solid lines are averages and intervals are standard deviations.
  • Figure 5: Density histogram of vehicle delays of the RL agent with and without phase re-service in freeway ramps Demand 4 scenario.
  • ...and 1 more figures