Table of Contents
Fetching ...

SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

Xiao-Cheng Liao, Yi Mei, Mengjie Zhang

TL;DR

SymLight addresses the deployment gap of DRL-based traffic signal control by learning interpretable symbolic priority functions via Monte Carlo Tree Search. The approach uses a concise token-based representation, movement-level features, and a probabilistic structural rollout to efficiently search for high-quality expressions, with a global objective as the reward to avoid reward-objective misalignment. Key innovations include the token-based priority-function representation, PSR-guided rollouts, and adaptive reward shaping, enabling scalable, edge-friendly policies that maintain strong performance. Experiments on six real-world CityFlow networks demonstrate superior travel time and throughput against diverse baselines, while producing human-understandable rules and showing robust generalization and deployability.

Abstract

Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameterized and non-transparent, hindering their interpretability and deployability on resource-limited edge devices. This work presents SymLight, a priority function search framework based on Monte Carlo Tree Search (MCTS) for discovering inherently interpretable and deployable symbolic priority functions to serve as the TSC policies. The priority function, in particular, accepts traffic features as input and then outputs a priority for each traffic signal phase, which subsequently directs the phase transition. For effective search, we propose a concise yet expressive priority function representation. This helps mitigate the combinatorial explosion of the action space in MCTS. Additionally, a probabilistic structural rollout strategy is introduced to leverage structural patterns from previously discovered high-quality priority functions, guiding the rollout process. Our experiments on real-world datasets demonstrate SymLight's superior performance across a range of baselines. A key advantage is SymLight's ability to produce interpretable and deployable TSC policies while maintaining excellent performance.

SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

TL;DR

SymLight addresses the deployment gap of DRL-based traffic signal control by learning interpretable symbolic priority functions via Monte Carlo Tree Search. The approach uses a concise token-based representation, movement-level features, and a probabilistic structural rollout to efficiently search for high-quality expressions, with a global objective as the reward to avoid reward-objective misalignment. Key innovations include the token-based priority-function representation, PSR-guided rollouts, and adaptive reward shaping, enabling scalable, edge-friendly policies that maintain strong performance. Experiments on six real-world CityFlow networks demonstrate superior travel time and throughput against diverse baselines, while producing human-understandable rules and showing robust generalization and deployability.

Abstract

Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameterized and non-transparent, hindering their interpretability and deployability on resource-limited edge devices. This work presents SymLight, a priority function search framework based on Monte Carlo Tree Search (MCTS) for discovering inherently interpretable and deployable symbolic priority functions to serve as the TSC policies. The priority function, in particular, accepts traffic features as input and then outputs a priority for each traffic signal phase, which subsequently directs the phase transition. For effective search, we propose a concise yet expressive priority function representation. This helps mitigate the combinatorial explosion of the action space in MCTS. Additionally, a probabilistic structural rollout strategy is introduced to leverage structural patterns from previously discovered high-quality priority functions, guiding the rollout process. Our experiments on real-world datasets demonstrate SymLight's superior performance across a range of baselines. A key advantage is SymLight's ability to produce interpretable and deployable TSC policies while maintaining excellent performance.

Paper Structure

This paper contains 32 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An illustration of a single phase decision at an intersection, showing the role of the priority function in SymLight. The priority function takes lane-level traffic features as input to determine the priority of each phase.
  • Figure 2: The overall framework of SymLight for exploring priority functions. To facilitate illustration, the state of an MCTS node is represented by the token list comprising all nodes on the path from the root to that node.
  • Figure 3: Example of priority function expansion. The priority function undergoes a sequential random expansion, proceeding from left to right. This process continues until $\mathbb{R}(\pi)=0$, when its expansion is finalized, culminating in the mathematical expression $\pi(\cdot) = \text{WI}\times\text{WI} - \text{WO}$.
  • Figure 4: (a) Two example priority functions obtained in Hangzhou1 datasets. (b) The occurrence frequency of each traffic feature within the optimal solutions across six scenarios.
  • Figure A5: The ablation study results in terms of average travel time, where lower values indicate better performance. (FM = SymLight, $\pi$ = $\pi$-Light)
  • ...and 3 more figures