Table of Contents
Fetching ...

DynamicLight: Two-Stage Dynamic Traffic Signal Timing

Liang Zhang, Yutong Zhang, Shubin Xie, Jianming Deng, Chen Li

TL;DR

DynamicLight addresses the rigidity of single-stage reinforcement learning for traffic signal control by introducing a two-stage framework that separately optimizes phase and duration. It employs a unified Deep Q-Network with lane-level feature fusion via multi-head attention to select a phase and its duration, enabling dynamic phase lengths across intersections. Across real-world (JN/HZ/NY) and synthetic topologies in CityFlow, DynamicLight achieves state-of-the-art ATT reductions versus Advanced-CoLight and other baselines, and its variants demonstrate robust scalability and transferability. The work highlights practical potential for real-world deployment, while noting computational demands and lack of inter-intersection coordination as future directions.

Abstract

Reinforcement learning (RL) is gaining popularity as an effective approach for traffic signal control (TSC) and is increasingly applied in this domain. However, most existing RL methodologies are confined to a single-stage TSC framework, primarily focusing on selecting an appropriate traffic signal phase at fixed action intervals, leading to inflexible and less adaptable phase durations. To address such limitations, we introduce a novel two-stage TSC framework named DynamicLight. This framework initiates with a phase control strategy responsible for determining the optimal traffic phase, followed by a duration control strategy tasked with determining the corresponding phase duration. Experimental results show that DynamicLight outperforms state-of-the-art TSC models and exhibits exceptional model generalization capabilities. Additionally, the robustness and potential for real-world implementation of DynamicLight are further demonstrated and validated through various DynamicLight variants. The code is released at https://github.com/LiangZhang1996/DynamicLight.

DynamicLight: Two-Stage Dynamic Traffic Signal Timing

TL;DR

DynamicLight addresses the rigidity of single-stage reinforcement learning for traffic signal control by introducing a two-stage framework that separately optimizes phase and duration. It employs a unified Deep Q-Network with lane-level feature fusion via multi-head attention to select a phase and its duration, enabling dynamic phase lengths across intersections. Across real-world (JN/HZ/NY) and synthetic topologies in CityFlow, DynamicLight achieves state-of-the-art ATT reductions versus Advanced-CoLight and other baselines, and its variants demonstrate robust scalability and transferability. The work highlights practical potential for real-world deployment, while noting computational demands and lack of inter-intersection coordination as future directions.

Abstract

Reinforcement learning (RL) is gaining popularity as an effective approach for traffic signal control (TSC) and is increasingly applied in this domain. However, most existing RL methodologies are confined to a single-stage TSC framework, primarily focusing on selecting an appropriate traffic signal phase at fixed action intervals, leading to inflexible and less adaptable phase durations. To address such limitations, we introduce a novel two-stage TSC framework named DynamicLight. This framework initiates with a phase control strategy responsible for determining the optimal traffic phase, followed by a duration control strategy tasked with determining the corresponding phase duration. Experimental results show that DynamicLight outperforms state-of-the-art TSC models and exhibits exceptional model generalization capabilities. Additionally, the robustness and potential for real-world implementation of DynamicLight are further demonstrated and validated through various DynamicLight variants. The code is released at https://github.com/LiangZhang1996/DynamicLight.
Paper Structure (41 sections, 6 equations, 5 figures, 12 tables, 1 algorithm)

This paper contains 41 sections, 6 equations, 5 figures, 12 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of a standard intersection structure with four entry and four exit approaches (East, West, South, and North), each featuring three types of lanes (left, straight, and right). Subfigures depict (b) traffic movement signals, (c) signal phases, and (d) state representations for a comprehensive overview.
  • Figure 2: Overview architecture of DynamicLight. (a) The TSC environment facilitates DynamicLight by providing state representations $\mathcal{S}$, executing received actions $\langle a^p, a^d\rangle$, and generating new states $\mathcal{S}^{\prime}$ and rewards $r$. It serves as the essential interface for interaction, enabling the seamless flow of information and feedback between the agent and its environment. These transition tuples $\langle\mathcal{S}, a^p, a^d, r, \mathcal{S}^{\prime} \rangle$ at an intersection are collected as the replay memory. (b) Feature fusion involves acquiring states from the environment and embedding them into lane features. Subsequently, the lane features undergo phase feature fusion through a multi-head self-attention (MHA) mechanism. (c) Phase control utilizes phase features as inputs and employs a deep network to approximate the Q-values. (d) Duration control selects the phase feature corresponding to the predicted phase action in (c) and embeds it to predict the Q-values. The phase action and duration action are determined using argmax operation. Note that the networks in (b) and (c) are updated with mini-batches $\langle\mathcal{S}, a^p, r, \mathcal{S}^{\prime}\rangle$ from the replay memory. Similarly, the networks in (b) and (d) are updated with mini-batches $\langle \mathcal{S}, a^d, r, \mathcal{S}^{\prime}\rangle$.
  • Figure 3: Visual depiction of the number of vehicle entries in both real-world and synthetic datasets.
  • Figure 4: Performance comparison of DynamicLight variants (ATT in seconds).
  • Figure 5: (a) Illustration of three different intersection topologies. (b) Performance comparison of DynamicLight variants with baseline models. (c) Comparison of ATT ratio on synthetic datasets, the error bars represent a 95% confidence interval for ATT ratio.