X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner
Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao
TL;DR
X-Light tackles the cross-city transferability problem in multi-agent traffic signal control by deploying a Transformer-on-Transformer (TonT) architecture. The framework uses a Lower Transformer to fuse full MDP trajectories $(o,a,r)$ of a target intersection and its neighbors, and an Upper Transformer to learn cross-city decision dynamics from historical trajectories across multiple scenarios, aided by GPI unification and multi-scenario co-training. Key contributions include the first TonT-based meta MARL approach for TSC, a residual connection before the actor-critic, a dynamic predictor for environment dynamics, and strong zero-shot transfer gains (up to 16.3% in Grid5×5) along with improved non-transfer performance and convergence. The results on SUMO simulations demonstrate robust transferability and practical potential for scalable, city-level TSC without retraining for new locales.
Abstract
The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.
