X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

Haoyuan Jiang; Ziyue Li; Hua Wei; Xuantang Xiong; Jingqing Ruan; Jiaming Lu; Hangyu Mao; Rui Zhao

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

TL;DR

X-Light tackles the cross-city transferability problem in multi-agent traffic signal control by deploying a Transformer-on-Transformer (TonT) architecture. The framework uses a Lower Transformer to fuse full MDP trajectories $(o,a,r)$ of a target intersection and its neighbors, and an Upper Transformer to learn cross-city decision dynamics from historical trajectories across multiple scenarios, aided by GPI unification and multi-scenario co-training. Key contributions include the first TonT-based meta MARL approach for TSC, a residual connection before the actor-critic, a dynamic predictor for environment dynamics, and strong zero-shot transfer gains (up to 16.3% in Grid5×5) along with improved non-transfer performance and convergence. The results on SUMO simulations demonstrate robust transferability and practical potential for scalable, city-level TSC without retraining for new locales.

Abstract

The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

TL;DR

of a target intersection and its neighbors, and an Upper Transformer to learn cross-city decision dynamics from historical trajectories across multiple scenarios, aided by GPI unification and multi-scenario co-training. Key contributions include the first TonT-based meta MARL approach for TSC, a residual connection before the actor-critic, a dynamic predictor for environment dynamics, and strong zero-shot transfer gains (up to 16.3% in Grid5×5) along with improved non-transfer performance and convergence. The results on SUMO simulations demonstrate robust transferability and practical potential for scalable, city-level TSC without retraining for new locales.

Abstract

Paper Structure (33 sections, 12 equations, 11 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 12 equations, 11 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Meta Reinforcement Learning-based TSC
Transformers in RL and Other Fields
Methodology
Overview
Lower Transformer
Upper Transformer
Actor-Critic
Multi-scenario Co-Training
Experiments
Datasets
Baselines
Evaluation Metrics
Results
...and 18 more sections

Figures (11)

Figure 1: (a) X-Light takes the MDP $o,a,r$ trajectories of the target and its neighbors: (b) the Lower Transformer learns the attention for all the $o,a,r$-s, so that, e.g., one intersection's $o$ may have high attention with another intersection's $a$; (c) Upper Transformer learns the attention over the time through all different scenarios.
Figure 2: Our method is co-trained with intersections' MDPs from various scenarios: (a) a GPI module unifying all the scenarios, (b) the proposed TonT Encoder, and (c) an actor-critic to make a decision. The TonT Encoder contains (b1) a Lower Transformer aggregating the $o$, $a$, and $r$ among the target and its neighbors and (b2) an Upper Transformer learning general decisions from multi-scenario historical MDPs.
Figure 3: Trip time during the training process of Top 3 methods.
Figure 4: Ablation of each component in X-Light. The final solution attained the shortest trip time among all ablation experiments, showcasing the effectiveness of our design components.
Figure 5: Impact of the number of scenarios co-training. As the number of scenarios increases, our performance also increases.
...and 6 more figures

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

TL;DR

Abstract

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

Authors

TL;DR

Abstract

Table of Contents

Figures (11)