Table of Contents
Fetching ...

CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control

Xibei Chen, Yifeng Zhang, Yuxiang Xiao, Mingfeng Fan, Maonan Wang, Guillaume Sartoretti

Abstract

Recent advances in robotics, automation, and artificial intelligence have enabled urban traffic systems to operate with increasing autonomy towards future smart cities, powered in part by the development of adaptive traffic signal control (ATSC), which dynamically optimizes signal phases to mitigate congestion and optimize traffic. However, achieving effective and generalizable large-scale ATSC remains a significant challenge due to the diverse intersection topologies and highly dynamic, complex traffic demand patterns across the network. Existing RL-based methods typically use a single shared policy for all scenarios, whose limited representational capacity makes it difficult to capture diverse traffic dynamics and generalize to unseen environments. To address these challenges, we propose CROSS, a novel Mixture-of-Experts (MoE)-based decentralized RL framework for generalizable ATSC. We first introduce a Predictive Contrastive Clustering (PCC) module that forecasts short-term state transitions to identify latent traffic patterns, followed by clustering and contrastive learning to enhance pattern-level representation. We further design a Scenario-Adaptive MoE module that augments a shared policy with multiple experts, thus enabling adaptive specialization and more flexible scenario-specific strategies. We conduct extensive experiments in the SUMO simulator on both synthetic and real-world traffic datasets. Compared with state-of-the-art baselines, CROSS achieves superior performance and generalization through improved representation of diverse traffic scenarios.

CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control

Abstract

Recent advances in robotics, automation, and artificial intelligence have enabled urban traffic systems to operate with increasing autonomy towards future smart cities, powered in part by the development of adaptive traffic signal control (ATSC), which dynamically optimizes signal phases to mitigate congestion and optimize traffic. However, achieving effective and generalizable large-scale ATSC remains a significant challenge due to the diverse intersection topologies and highly dynamic, complex traffic demand patterns across the network. Existing RL-based methods typically use a single shared policy for all scenarios, whose limited representational capacity makes it difficult to capture diverse traffic dynamics and generalize to unseen environments. To address these challenges, we propose CROSS, a novel Mixture-of-Experts (MoE)-based decentralized RL framework for generalizable ATSC. We first introduce a Predictive Contrastive Clustering (PCC) module that forecasts short-term state transitions to identify latent traffic patterns, followed by clustering and contrastive learning to enhance pattern-level representation. We further design a Scenario-Adaptive MoE module that augments a shared policy with multiple experts, thus enabling adaptive specialization and more flexible scenario-specific strategies. We conduct extensive experiments in the SUMO simulator on both synthetic and real-world traffic datasets. Compared with state-of-the-art baselines, CROSS achieves superior performance and generalization through improved representation of diverse traffic scenarios.

Paper Structure

This paper contains 23 sections, 15 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (a) Overview of a typical T-junction, which consists of six incoming lanes, six outgoing lanes, 12 traffic movements, and three traffic phases. (b) Traffic state and phase state vectors used in CROSS, with elements of both constructed based on ordered traffic movements.
  • Figure 2: Architecture of the proposed CROSS framework, which includes a GFE module to extract rich traffic dynamics and form a unified representation for heterogeneous intersections, a PCC module to predict traffic dynamics and match them with learnable cluster centers for generalizable pattern representations, and a Scenario-Adaptive MoE module that uses these representations as routing context to compute weights and generate the traffic policy and value functions.
  • Figure 3: Average trip duration across three synthetic training datasets comparing CROSS with baseline methods (GESA, Unicorn) and ablations (CROSS w/o MoE, CROSS w/o PCC). Here $\downarrow$ denotes lower is better.