MacLight: Multi-scene Aggregation Convolutional Learning for Traffic Signal Control
Sunbowen Lee, Hongqin Lyu, Yicheng Gong, Yingying Sun, Chao Deng
TL;DR
MacLight introduces a CNN-based variational autoencoder to obtain a compact global state embedding and couples it with local intersection features within a PPO framework, enabling fast and robust traffic signal control. The method avoids graph-based parallelization bottlenecks by using a multi-scene aggregation matrix and dynamic SUMO-generated traffic scenarios, including emergency road events. Empirical results across Normal, Peak, and Block traffic conditions demonstrate improved stability and time efficiency relative to several baselines, with on-policy training achieving substantial speedups. The work offers a practical, scalable approach for real-time TSC and provides a flexible dynamic testing environment to evaluate policy robustness in evolving traffic conditions.
Abstract
Reinforcement learning methods have proposed promising traffic signal control policy that can be trained on large road networks. Current SOTA methods model road networks as topological graph structures, incorporate graph attention into deep Q-learning, and merge local and global embeddings to improve policy. However, graph-based methods are difficult to parallelize, resulting in huge time overhead. Moreover, none of the current peer studies have deployed dynamic traffic systems for experiments, which is far from the actual situation. In this context, we propose Multi-Scene Aggregation Convolutional Learning for traffic signal control (MacLight), which offers faster training speeds and more stable performance. Our approach consists of two main components. The first is the global representation, where we utilize variational autoencoders to compactly compress and extract the global representation. The second component employs the proximal policy optimization algorithm as the backbone, allowing value evaluation to consider both local features and global embedding representations. This backbone model significantly reduces time overhead and ensures stability in policy updates. We validated our method across multiple traffic scenarios under both static and dynamic traffic systems. Experimental results demonstrate that, compared to general and domian SOTA methods, our approach achieves superior stability, optimized convergence levels and the highest time efficiency. The code is under https://github.com/Aegis1863/MacLight.
