OffLight: An Offline Multi-Agent Reinforcement Learning Framework for Traffic Signal Control
Rohit Bokade, Xiaoning Jin
TL;DR
OffLight tackles offline multi-agent reinforcement learning for traffic signal control by explicitly modeling heterogeneous behavior policies with a Gaussian mixture variational graph autoencoder (GMM-VGAE). It integrates importance sampling to correct distributional shifts and return-based prioritized sampling to focus on high-quality experiences, resulting in robust policy learning from mixed-policy datasets. Empirical results across real-world networks show up to 7.8% reductions in average travel time and 11.2% reductions in queue length, with ablations confirming the value of each component. The framework is scalable, adaptable to existing offline RL algorithms, and reduces the risks associated with online exploration in urban traffic environments.
Abstract
Efficient traffic control (TSC) is essential for urban mobility, but traditional systems struggle to handle the complexity of real-world traffic. Multi-agent Reinforcement Learning (MARL) offers adaptive solutions, but online MARL requires extensive interactions with the environment, making it costly and impractical. Offline MARL mitigates these challenges by using historical traffic data for training but faces significant difficulties with heterogeneous behavior policies in real-world datasets, where mixed-quality data complicates learning. We introduce OffLight, a novel offline MARL framework designed to handle heterogeneous behavior policies in TSC datasets. To improve learning efficiency, OffLight incorporates Importance Sampling (IS) to correct for distributional shifts and Return-Based Prioritized Sampling (RBPS) to focus on high-quality experiences. OffLight utilizes a Gaussian Mixture Variational Graph Autoencoder (GMM-VGAE) to capture the diverse distribution of behavior policies from local observations. Extensive experiments across real-world urban traffic scenarios show that OffLight outperforms existing offline RL methods, achieving up to a 7.8% reduction in average travel time and 11.2% decrease in queue length. Ablation studies confirm the effectiveness of OffLight's components in handling heterogeneous data and improving policy performance. These results highlight OffLight's scalability and potential to improve urban traffic management without the risks of online learning.
