SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control
Xiao-Cheng Liao, Yi Mei, Mengjie Zhang
TL;DR
SymLight addresses the deployment gap of DRL-based traffic signal control by learning interpretable symbolic priority functions via Monte Carlo Tree Search. The approach uses a concise token-based representation, movement-level features, and a probabilistic structural rollout to efficiently search for high-quality expressions, with a global objective as the reward to avoid reward-objective misalignment. Key innovations include the token-based priority-function representation, PSR-guided rollouts, and adaptive reward shaping, enabling scalable, edge-friendly policies that maintain strong performance. Experiments on six real-world CityFlow networks demonstrate superior travel time and throughput against diverse baselines, while producing human-understandable rules and showing robust generalization and deployability.
Abstract
Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameterized and non-transparent, hindering their interpretability and deployability on resource-limited edge devices. This work presents SymLight, a priority function search framework based on Monte Carlo Tree Search (MCTS) for discovering inherently interpretable and deployable symbolic priority functions to serve as the TSC policies. The priority function, in particular, accepts traffic features as input and then outputs a priority for each traffic signal phase, which subsequently directs the phase transition. For effective search, we propose a concise yet expressive priority function representation. This helps mitigate the combinatorial explosion of the action space in MCTS. Additionally, a probabilistic structural rollout strategy is introduced to leverage structural patterns from previously discovered high-quality priority functions, guiding the rollout process. Our experiments on real-world datasets demonstrate SymLight's superior performance across a range of baselines. A key advantage is SymLight's ability to produce interpretable and deployable TSC policies while maintaining excellent performance.
