Generalized Phase Pressure Control Enhanced Reinforcement Learning for Traffic Signal Control
Xiao-Cheng Liao, Yi Mei, Mengjie Zhang, Xiang-Ling Chen
TL;DR
This work tackles the challenge of designing theoretically grounded traffic state representations for traffic signal control and achieving stable, high-performance policies. It introduces Generalized Phase Pressure (G2P), a pressure-based control framework that accounts for absolute and relative traffic conditions across multi-lane intersections, and extends pressure theory to multi-homogeneous-lane networks. The authors derive a generalized phase pressure, propose a RL template (G2P-XLight) with two variants (G2P-MPLight, G2P-CoLight), and demonstrate substantial gains over state-of-the-art heuristic and learning-based methods on CityFlow real-world datasets. The results indicate improved performance, stability, and data efficiency, with G2P-CoLight showing strong generalization in unseen Manhattan scenarios, and code made available for reproducibility.
Abstract
Appropriate traffic state representation is crucial for learning traffic signal control policies. However, most of the current traffic state representations are heuristically designed, with insufficient theoretical support. In this paper, we (1) develop a flexible, efficient, and theoretically grounded method, namely generalized phase pressure (G2P) control, which takes only simple lane features into consideration to decide which phase to be actuated; 2) extend the pressure control theory to a general form for multi-homogeneous-lane road networks based on queueing theory; (3) design a new traffic state representation based on the generalized phase state features from G2P control; and 4) develop a reinforcement learning (RL)-based algorithm template named G2P-XLight, and two RL algorithms, G2P-MPLight and G2P-CoLight, by combining the generalized phase state representation with MPLight and CoLight, two well-performed RL methods for learning traffic signal control policies. Extensive experiments conducted on multiple real-world datasets demonstrate that G2P control outperforms the state-of-the-art (SOTA) heuristic method in the transportation field and other recent human-designed heuristic methods; and that the newly proposed G2P-XLight significantly outperforms SOTA learning-based approaches. Our code is available online.
