Table of Contents
Fetching ...

GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

Haoyuan Jiang, Xuantang Xiong, Ziyue Li, Hangyu Mao, Guanghu Sui, Jingqing Ruan, Yuheng Cheng, Hua Wei, Wolfgang Ketter, Rui Zhao

TL;DR

GuidedLight tackles the gap between reinforcement learning (RL) for traffic signal control (TSC) and real-world industrial requirements by constraining inputs to traffic flow, enforcing cyclic phase durations as outputs, and preserving a non-decreasing cycle-flow relation. It combines behavior cloning from industry solutions (e.g., SCATS) with curriculum learning and an RL actor-critic to guide policy development while allowing exploration, formalized through a loss $\mathcal{L} = \alpha \mathcal{L}_{Actor} + \beta \mathcal{L}_{Critic} + \kappa \mathcal{L}_{BC}$. The authors prove that such guidance yields a polynomial sample complexity in horizon $H$, and empirically demonstrate superior performance and cycle-flow synchronization on a SUMO-based Fenglin dataset with real 24-hour flow data across 10 intersections. This work advances practical deployment of RL for TSC by ensuring compatibility with industry hardware, improving stability, and offering scalable training guarantees for real-world deployment.

Abstract

Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be reliably collected, whereas common RL methods need more. For the output action, most RL methods focus on acyclic control, which real-world signal controllers do not support. Most importantly, industry standards require a consistent cycle-flow relationship: non-decreasing and different response strategies for low, medium, and high-level flows, which is ignored by the RL methods. To narrow the gap between RL methods and industry standards, we innovatively propose to use industry solutions to guide the RL agent. Specifically, we design behavior cloning and curriculum learning to guide the agent to mimic and meet industry requirements and, at the same time, leverage the power of exploration and exploitation in RL for better performance. We theoretically prove that such guidance can largely decrease the sample complexity to polynomials in the horizon when searching for an optimal policy. Our rigid experiments show that our method has good cycle-flow relation and superior performance.

GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

TL;DR

GuidedLight tackles the gap between reinforcement learning (RL) for traffic signal control (TSC) and real-world industrial requirements by constraining inputs to traffic flow, enforcing cyclic phase durations as outputs, and preserving a non-decreasing cycle-flow relation. It combines behavior cloning from industry solutions (e.g., SCATS) with curriculum learning and an RL actor-critic to guide policy development while allowing exploration, formalized through a loss . The authors prove that such guidance yields a polynomial sample complexity in horizon , and empirically demonstrate superior performance and cycle-flow synchronization on a SUMO-based Fenglin dataset with real 24-hour flow data across 10 intersections. This work advances practical deployment of RL for TSC by ensuring compatibility with industry hardware, improving stability, and offering scalable training guarantees for real-world deployment.

Abstract

Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be reliably collected, whereas common RL methods need more. For the output action, most RL methods focus on acyclic control, which real-world signal controllers do not support. Most importantly, industry standards require a consistent cycle-flow relationship: non-decreasing and different response strategies for low, medium, and high-level flows, which is ignored by the RL methods. To narrow the gap between RL methods and industry standards, we innovatively propose to use industry solutions to guide the RL agent. Specifically, we design behavior cloning and curriculum learning to guide the agent to mimic and meet industry requirements and, at the same time, leverage the power of exploration and exploitation in RL for better performance. We theoretically prove that such guidance can largely decrease the sample complexity to polynomials in the horizon when searching for an optimal policy. Our rigid experiments show that our method has good cycle-flow relation and superior performance.
Paper Structure (22 sections, 20 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 20 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of a standard intersection and phases. A phase is standardized as a combination of two non-conflicting movements, e.g., phase A is movement 1 and 5 combined. In industry, a four-phase cycle control, i.e., A$\to$D$\to$E$\to$H, is common, representing EW-Though $\to$ EW-Left $\to$ SN-Through $\to$ SN-Left.
  • Figure 2: The gaps between industry requirements and RL-based academic methods: Industrial requirements: (a1) they follow cyclic control, e.g., the cycle of phases A-D-E-H (details in Fig. \ref{['fig: intersection']}), and decide the duration for each phase; (a2) The cycle time should be non-decreasing with traffic flow, encouraging stable drivers' behaviors. It also has three stages: non-peak, climbing, and peak, with different stages featuring various controlling patterns. Academic Solutions: (b1) Common RL-based methods in academia control cyclically, i.e., to choose one phase out of eight phases (A-H) for the next time interval, with each chosen phase running for a fixed interval, e.g., 10 seconds; (b2) The cycle time-traffic flow relation is also diverse and unstable, disrupting drivers negatively. Our solution is the first RL-based cyclic TSC agent, which perfectly follows the non-decreasing relations, with also clear three stages.
  • Figure 3: Illustration of our proposed GuidedLight. One agent controls one intersection. (a) For each intersection, it will observe movement-level features related to traffic dynamics and then use FRAP's aggregating module to aggregate two non-conflicting movements’ embeddings into one phase of traffic embedding. (b) Combining phase traffic embedding with other phase-level features, we get a wholesome embedding for a phase and then input them into LSTM and Actor-Critic. (c) Teachers such as linear controller and industrial controller SCATS are adopted, via Behavior Cloning and Curriculum Learning, to guide the agent's cycle length (CL) in mimicking the teacher's CL.
  • Figure 4: Illustration of Fenglin scenario.
  • Figure 5: The synchronization between cycle time and traffic flow in Fenglin dataset. As in (e2), GuidedLight outputs the control policies with cycle time perfectly following the flow and thus meets industry requirements. Fig \ref{['fig: intro-demo']}(b2) summarizes the overall cycle time-traffic flow relation.
  • ...and 2 more figures