Table of Contents
Fetching ...

Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control

Yifeng Zhang, Yilin Liu, Ping Gong, Peizhuo Li, Mingfeng Fan, Guillaume Sartoretti

TL;DR

Unicorn tackles generalizable network-wide ATSC by unifying intersection state-action representations around traffic movements and introducing two core modules: Universal Traffic Representation ($UTR$) for general feature extraction and Intersection Specifics Representation ($ISR$) for topology-aware latent modeling via a variational autoencoder. A contrastive learning objective refines latent representations, while collaborative policy optimization leverages neighboring actions with an attention mechanism to enable efficient neighborhood coordination under a PPO framework. Evaluations across eight datasets show Unicorn consistently outperforms baselines, including both independent and shared-parameter MARL methods, in metrics like queue length, average speed, and trip delay, and joint training further attains robust generalization. These findings imply substantial practical impact for deployment in real urban networks, enabling scalable, adaptable, and cooperative traffic signal control across heterogeneous topologies and demands.

Abstract

Adaptive traffic signal control (ATSC) is crucial in reducing congestion, maximizing throughput, and improving mobility in rapidly growing urban areas. Recent advancements in parameter-sharing multi-agent reinforcement learning (MARL) have greatly enhanced the scalable and adaptive optimization of complex, dynamic flows in large-scale homogeneous networks. However, the inherent heterogeneity of real-world traffic networks, with their varied intersection topologies and interaction dynamics, poses substantial challenges to achieving scalable and effective ATSC across different traffic scenarios. To address these challenges, we present Unicorn, a universal and collaborative MARL framework designed for efficient and adaptable network-wide ATSC. Specifically, we first propose a unified approach to map the states and actions of intersections with varying topologies into a common structure based on traffic movements. Next, we design a Universal Traffic Representation (UTR) module with a decoder-only network for general feature extraction, enhancing the model's adaptability to diverse traffic scenarios. Additionally, we incorporate an Intersection Specifics Representation (ISR) module, designed to identify key latent vectors that represent the unique intersection's topology and traffic dynamics through variational inference techniques. To further refine these latent representations, we employ a contrastive learning approach in a self-supervised manner, which enables better differentiation of intersection-specific features. Moreover, we integrate the state-action dependencies of neighboring agents into policy optimization, which effectively captures dynamic agent interactions and facilitates efficient regional collaboration. Our results show that Unicorn outperforms other methods across various evaluation metrics, highlighting its potential in complex, dynamic traffic networks.

Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control

TL;DR

Unicorn tackles generalizable network-wide ATSC by unifying intersection state-action representations around traffic movements and introducing two core modules: Universal Traffic Representation () for general feature extraction and Intersection Specifics Representation () for topology-aware latent modeling via a variational autoencoder. A contrastive learning objective refines latent representations, while collaborative policy optimization leverages neighboring actions with an attention mechanism to enable efficient neighborhood coordination under a PPO framework. Evaluations across eight datasets show Unicorn consistently outperforms baselines, including both independent and shared-parameter MARL methods, in metrics like queue length, average speed, and trip delay, and joint training further attains robust generalization. These findings imply substantial practical impact for deployment in real urban networks, enabling scalable, adaptable, and cooperative traffic signal control across heterogeneous topologies and demands.

Abstract

Adaptive traffic signal control (ATSC) is crucial in reducing congestion, maximizing throughput, and improving mobility in rapidly growing urban areas. Recent advancements in parameter-sharing multi-agent reinforcement learning (MARL) have greatly enhanced the scalable and adaptive optimization of complex, dynamic flows in large-scale homogeneous networks. However, the inherent heterogeneity of real-world traffic networks, with their varied intersection topologies and interaction dynamics, poses substantial challenges to achieving scalable and effective ATSC across different traffic scenarios. To address these challenges, we present Unicorn, a universal and collaborative MARL framework designed for efficient and adaptable network-wide ATSC. Specifically, we first propose a unified approach to map the states and actions of intersections with varying topologies into a common structure based on traffic movements. Next, we design a Universal Traffic Representation (UTR) module with a decoder-only network for general feature extraction, enhancing the model's adaptability to diverse traffic scenarios. Additionally, we incorporate an Intersection Specifics Representation (ISR) module, designed to identify key latent vectors that represent the unique intersection's topology and traffic dynamics through variational inference techniques. To further refine these latent representations, we employ a contrastive learning approach in a self-supervised manner, which enables better differentiation of intersection-specific features. Moreover, we integrate the state-action dependencies of neighboring agents into policy optimization, which effectively captures dynamic agent interactions and facilitates efficient regional collaboration. Our results show that Unicorn outperforms other methods across various evaluation metrics, highlighting its potential in complex, dynamic traffic networks.

Paper Structure

This paper contains 31 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: (a) illustrates the two main sources of heterogeneity in complex real-world traffic networks with multiple intersections: heterogeneity in intersection topology structures and traffic demands (internal), and heterogeneity in interconnection relationships between connected intersections (external). (b) provides an overview of our proposed Unicorn framework, which consists of three key components: (1) a Universal Traffic Representation (UTR) module for unified state-action representation and general feature extraction across intersections with diverse topology structures, (2) an Intersection Specifics Representation (ISR) module to enhance the learning of diverse intersection-specific features within the traffic network, and (3) a Collaborative Learning algorithm that strengthens neighborhood coordination and collaboration by leveraging unified state-action dependencies with neighboring intersections.
  • Figure 2: (a) Overview of a typical 3-arm intersection, which consists of six incoming lanes, six outgoing lanes, 12 traffic movements, and three traffic phases. (b) The top portion illustrates how each traffic movement is assembled by linking an incoming lane with an outgoing lane. The bottom part shows that each phase consists of a group of activated, non-conflicting traffic movements, thereby establishing the relationship between phases and movements. (c) Illustration of the proposed traffic state vector and traffic phase vector (comprising multiple phase state vectors), both constructed based on ordered traffic movements.
  • Figure 3: A detailed illustration of the input vectors (left), the UTR module's General Feature Extraction (GFE) network (top), and ISR module's Intersection-Specific Extraction (ISE) network (bottom), along with the feature integration for policy function and value function outputs (right).
  • Figure 4: The eight traffic datasets used for performance evaluation are displayed from left to right and top to bottom: Arterial 4$\times$4, Grid 4$\times$4, Cologne, Ingolstadt, Fenglin, Nanshan, Grid 5$\times$5, and Monaco. Typical intersection structures for each traffic network are highlighted within the circles.
  • Figure 5: Variation of evaluation metrics (average queue length, trip completion rate, and average trip time) over simulation time (3600 seconds) on the Grid 5$\times$5 map of the MA2C dataset chu2019multi. Here, solid lines represent the average across 10 testing episodes, with shaded areas indicating variance.