Evolutionary Discovery of Heuristic Policies for Traffic Signal Control
Ruibing Wang, Shuhan Guo, Zeen Li, Zhen Wang, Quanming Yao
TL;DR
The paper tackles the trade-off between generality and specialization in traffic signal control by introducing TPET, an LLM-driven evolutionary framework that discovers specialized, lightweight heuristics without training. TPET uses Structured State Abstraction (SSA) to translate high-dimensional traffic data into temporal-logical facts and Credit Assignment Feedback (CAF) to provide defect-driven, post-hoc critiques, guiding iterative policy mutations. Experiments on CityFlow with multiple real-world datasets show TPET outperforms traditional heuristics and rival methods, while exhibiting superior stability compared to online LLM actors. The work offers a practical, interpretable alternative to fixed controls and opaque DRL/LLM approaches, with potential for rapid deployment in adaptive traffic environments.
Abstract
Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suffers from poor generalization and opaque policies. Online Large Language Models (LLMs) provide general reasoning but incur high latency and lack environment-specific optimization. To address these issues, we propose Temporal Policy Evolution for Traffic (\textbf{\method{}}), which uses LLMs as an evolution engine to derive specialized heuristic policies. The framework introduces two key modules: (1) Structured State Abstraction (SSA), converting high-dimensional traffic data into temporal-logical facts for reasoning; and (2) Credit Assignment Feedback (CAF), tracing flawed micro-decisions to poor macro-outcomes for targeted critique. Operating entirely at the prompt level without training, \method{} yields lightweight, robust policies optimized for specific traffic environments, outperforming both heuristics and online LLM actors.
