A Parallel Hybrid Action Space Reinforcement Learning Model for Real-world Adaptive Traffic Signal Control
Yuxuan Wang, Meng Long, Qiang Wu, Wei Liu, Jiatian Pi, Xinmin Yang
TL;DR
This work introduces PH-DDPG, a parallel hybrid action space reinforcement learning model for real-world adaptive traffic signal control that jointly optimizes discrete phase choices and continuous phase durations. It advances hybrid-action learning with decoupled parameterized actions, a Gaussian-noise-based action masking mechanism, and an online/offline deployment framework, achieving state-of-the-art results on three real-world datasets. The approach demonstrates robustness to data sources and outperforms baselines, including in dense urban networks, while reducing reliance on expert data. Practical implications include faster deployment and better real-time adaptability in large-scale traffic networks, with future work focusing on multi-agent coordination and broader scenarios.
Abstract
Adaptive traffic signal control (ATSC) can effectively reduce vehicle travel times by dynamically adjusting signal timings but poses a critical challenge in real-world scenarios due to the complexity of real-time decision-making in dynamic and uncertain traffic conditions. The burgeoning field of intelligent transportation systems, bolstered by artificial intelligence techniques and extensive data availability, offers new prospects for the implementation of ATSC. In this study, we introduce a parallel hybrid action space reinforcement learning model (PH-DDPG) that optimizes traffic signal phase and duration of traffic signals simultaneously, eliminating the need for sequential decision-making seen in traditional two-stage models. Our model features a task-specific parallel hybrid action space tailored for adaptive traffic control, which directly outputs discrete phase selections and their associated continuous duration parameters concurrently, thereby inherently addressing dynamic traffic adaptation through unified parametric optimization. %Our model features a unique parallel hybrid action space that allows for the simultaneous output of each action and its optimal parameters, streamlining the decision-making process. Furthermore, to ascertain the robustness and effectiveness of this approach, we executed ablation studies focusing on the utilization of a random action parameter mask within the critic network, which decouples the parameter space for individual actions, facilitating the use of preferable parameters for each action. The results from these studies confirm the efficacy of this method, distinctly enhancing real-world applicability
