Table of Contents
Fetching ...

OffLight: An Offline Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

Rohit Bokade, Xiaoning Jin

TL;DR

OffLight tackles offline multi-agent reinforcement learning for traffic signal control by explicitly modeling heterogeneous behavior policies with a Gaussian mixture variational graph autoencoder (GMM-VGAE). It integrates importance sampling to correct distributional shifts and return-based prioritized sampling to focus on high-quality experiences, resulting in robust policy learning from mixed-policy datasets. Empirical results across real-world networks show up to 7.8% reductions in average travel time and 11.2% reductions in queue length, with ablations confirming the value of each component. The framework is scalable, adaptable to existing offline RL algorithms, and reduces the risks associated with online exploration in urban traffic environments.

Abstract

Efficient traffic control (TSC) is essential for urban mobility, but traditional systems struggle to handle the complexity of real-world traffic. Multi-agent Reinforcement Learning (MARL) offers adaptive solutions, but online MARL requires extensive interactions with the environment, making it costly and impractical. Offline MARL mitigates these challenges by using historical traffic data for training but faces significant difficulties with heterogeneous behavior policies in real-world datasets, where mixed-quality data complicates learning. We introduce OffLight, a novel offline MARL framework designed to handle heterogeneous behavior policies in TSC datasets. To improve learning efficiency, OffLight incorporates Importance Sampling (IS) to correct for distributional shifts and Return-Based Prioritized Sampling (RBPS) to focus on high-quality experiences. OffLight utilizes a Gaussian Mixture Variational Graph Autoencoder (GMM-VGAE) to capture the diverse distribution of behavior policies from local observations. Extensive experiments across real-world urban traffic scenarios show that OffLight outperforms existing offline RL methods, achieving up to a 7.8% reduction in average travel time and 11.2% decrease in queue length. Ablation studies confirm the effectiveness of OffLight's components in handling heterogeneous data and improving policy performance. These results highlight OffLight's scalability and potential to improve urban traffic management without the risks of online learning.

OffLight: An Offline Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

TL;DR

OffLight tackles offline multi-agent reinforcement learning for traffic signal control by explicitly modeling heterogeneous behavior policies with a Gaussian mixture variational graph autoencoder (GMM-VGAE). It integrates importance sampling to correct distributional shifts and return-based prioritized sampling to focus on high-quality experiences, resulting in robust policy learning from mixed-policy datasets. Empirical results across real-world networks show up to 7.8% reductions in average travel time and 11.2% reductions in queue length, with ablations confirming the value of each component. The framework is scalable, adaptable to existing offline RL algorithms, and reduces the risks associated with online exploration in urban traffic environments.

Abstract

Efficient traffic control (TSC) is essential for urban mobility, but traditional systems struggle to handle the complexity of real-world traffic. Multi-agent Reinforcement Learning (MARL) offers adaptive solutions, but online MARL requires extensive interactions with the environment, making it costly and impractical. Offline MARL mitigates these challenges by using historical traffic data for training but faces significant difficulties with heterogeneous behavior policies in real-world datasets, where mixed-quality data complicates learning. We introduce OffLight, a novel offline MARL framework designed to handle heterogeneous behavior policies in TSC datasets. To improve learning efficiency, OffLight incorporates Importance Sampling (IS) to correct for distributional shifts and Return-Based Prioritized Sampling (RBPS) to focus on high-quality experiences. OffLight utilizes a Gaussian Mixture Variational Graph Autoencoder (GMM-VGAE) to capture the diverse distribution of behavior policies from local observations. Extensive experiments across real-world urban traffic scenarios show that OffLight outperforms existing offline RL methods, achieving up to a 7.8% reduction in average travel time and 11.2% decrease in queue length. Ablation studies confirm the effectiveness of OffLight's components in handling heterogeneous data and improving policy performance. These results highlight OffLight's scalability and potential to improve urban traffic management without the risks of online learning.

Paper Structure

This paper contains 26 sections, 13 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: General Offline MARL Framework for Traffic Signal Control
  • Figure 2: OffLight Architecture: Integrating GMM-VGAE with IS and RBPS for offline MARL in traffic signal control
  • Figure 3: Distribution of Episodic Returns in mixed-policy datasets. It highlights the variability and heterogeneity in the offline dataset, with some episodes achieving high returns while others are suboptimal. RBPS targets this imbalance by prioritizing episodes with higher returns, ensuring that learning is focused on successful traffic control strategies.
  • Figure 4: Traffic networks used in the experiments: (a) Jinan, (b) Hangzhou, and (c) Manhattan, illustrating the varying scales and complexities of the test scenarios.
  • Figure 5: Performance Comparison of OffLight (CQL) and OffLight (TD3+BC) on Mixed Data showing the improvements in average travel time (ATT) across different traffic demand levels
  • ...and 4 more figures