Table of Contents
Fetching ...

A Parallel Hybrid Action Space Reinforcement Learning Model for Real-world Adaptive Traffic Signal Control

Yuxuan Wang, Meng Long, Qiang Wu, Wei Liu, Jiatian Pi, Xinmin Yang

TL;DR

This work introduces PH-DDPG, a parallel hybrid action space reinforcement learning model for real-world adaptive traffic signal control that jointly optimizes discrete phase choices and continuous phase durations. It advances hybrid-action learning with decoupled parameterized actions, a Gaussian-noise-based action masking mechanism, and an online/offline deployment framework, achieving state-of-the-art results on three real-world datasets. The approach demonstrates robustness to data sources and outperforms baselines, including in dense urban networks, while reducing reliance on expert data. Practical implications include faster deployment and better real-time adaptability in large-scale traffic networks, with future work focusing on multi-agent coordination and broader scenarios.

Abstract

Adaptive traffic signal control (ATSC) can effectively reduce vehicle travel times by dynamically adjusting signal timings but poses a critical challenge in real-world scenarios due to the complexity of real-time decision-making in dynamic and uncertain traffic conditions. The burgeoning field of intelligent transportation systems, bolstered by artificial intelligence techniques and extensive data availability, offers new prospects for the implementation of ATSC. In this study, we introduce a parallel hybrid action space reinforcement learning model (PH-DDPG) that optimizes traffic signal phase and duration of traffic signals simultaneously, eliminating the need for sequential decision-making seen in traditional two-stage models. Our model features a task-specific parallel hybrid action space tailored for adaptive traffic control, which directly outputs discrete phase selections and their associated continuous duration parameters concurrently, thereby inherently addressing dynamic traffic adaptation through unified parametric optimization. %Our model features a unique parallel hybrid action space that allows for the simultaneous output of each action and its optimal parameters, streamlining the decision-making process. Furthermore, to ascertain the robustness and effectiveness of this approach, we executed ablation studies focusing on the utilization of a random action parameter mask within the critic network, which decouples the parameter space for individual actions, facilitating the use of preferable parameters for each action. The results from these studies confirm the efficacy of this method, distinctly enhancing real-world applicability

A Parallel Hybrid Action Space Reinforcement Learning Model for Real-world Adaptive Traffic Signal Control

TL;DR

This work introduces PH-DDPG, a parallel hybrid action space reinforcement learning model for real-world adaptive traffic signal control that jointly optimizes discrete phase choices and continuous phase durations. It advances hybrid-action learning with decoupled parameterized actions, a Gaussian-noise-based action masking mechanism, and an online/offline deployment framework, achieving state-of-the-art results on three real-world datasets. The approach demonstrates robustness to data sources and outperforms baselines, including in dense urban networks, while reducing reliance on expert data. Practical implications include faster deployment and better real-time adaptability in large-scale traffic networks, with future work focusing on multi-agent coordination and broader scenarios.

Abstract

Adaptive traffic signal control (ATSC) can effectively reduce vehicle travel times by dynamically adjusting signal timings but poses a critical challenge in real-world scenarios due to the complexity of real-time decision-making in dynamic and uncertain traffic conditions. The burgeoning field of intelligent transportation systems, bolstered by artificial intelligence techniques and extensive data availability, offers new prospects for the implementation of ATSC. In this study, we introduce a parallel hybrid action space reinforcement learning model (PH-DDPG) that optimizes traffic signal phase and duration of traffic signals simultaneously, eliminating the need for sequential decision-making seen in traditional two-stage models. Our model features a task-specific parallel hybrid action space tailored for adaptive traffic control, which directly outputs discrete phase selections and their associated continuous duration parameters concurrently, thereby inherently addressing dynamic traffic adaptation through unified parametric optimization. %Our model features a unique parallel hybrid action space that allows for the simultaneous output of each action and its optimal parameters, streamlining the decision-making process. Furthermore, to ascertain the robustness and effectiveness of this approach, we executed ablation studies focusing on the utilization of a random action parameter mask within the critic network, which decouples the parameter space for individual actions, facilitating the use of preferable parameters for each action. The results from these studies confirm the efficacy of this method, distinctly enhancing real-world applicability

Paper Structure

This paper contains 23 sections, 14 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: This figure illustrates the structural differences in reinforcement learning among DDPG, P-DQN, and PH-DDPG. DDPG outputs a single continuous action by maximizing the Q-value for the optimal action parameter. P-DQN focuses on selecting the optimal discrete action under a fixed optimal action parameter. PH-DDPG extends these approaches by computing the Q-value for each discrete action under its respective optimal action parameter. As a result, PH-DDPG generates multiple continuous action parameters in the Actor part, distinguishing it from the other two methods and enabling more flexible and adaptive decision-making in hybrid action spaces.
  • Figure 2: The illustration of an intersection with two different four-phase action set case.
  • Figure 3: The illustration of our proposed PH-DDPG for ATSC.
  • Figure 4: Schematic diagram of the decoupled action space in the PH-DDPG framework. The critic network $Q_{\varphi}(s, \boldsymbol{x}_k)$ is trained using the reconstructed noisy action parameters, where the action parameter mask function $\psi(k, x_k)$ selectively masks the parameters of the chosen action $k$ and introduces Gaussian noise for the parameters of other actions $j \neq k$.
  • Figure 5: Study area road networks with controlled traffic signals (blue dots). (a) 12-intersection network with bidirectional traffic in Jinan; (b) 16-intersection system supporting uni- and bi-directional flows in Hangzhou; (c) 196-signal unidirectional network in Manhattan. Red polygons denote modeled areas.
  • ...and 4 more figures