Table of Contents
Fetching ...

Evolutionary Discovery of Heuristic Policies for Traffic Signal Control

Ruibing Wang, Shuhan Guo, Zeen Li, Zhen Wang, Quanming Yao

TL;DR

The paper tackles the trade-off between generality and specialization in traffic signal control by introducing TPET, an LLM-driven evolutionary framework that discovers specialized, lightweight heuristics without training. TPET uses Structured State Abstraction (SSA) to translate high-dimensional traffic data into temporal-logical facts and Credit Assignment Feedback (CAF) to provide defect-driven, post-hoc critiques, guiding iterative policy mutations. Experiments on CityFlow with multiple real-world datasets show TPET outperforms traditional heuristics and rival methods, while exhibiting superior stability compared to online LLM actors. The work offers a practical, interpretable alternative to fixed controls and opaque DRL/LLM approaches, with potential for rapid deployment in adaptive traffic environments.

Abstract

Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suffers from poor generalization and opaque policies. Online Large Language Models (LLMs) provide general reasoning but incur high latency and lack environment-specific optimization. To address these issues, we propose Temporal Policy Evolution for Traffic (\textbf{\method{}}), which uses LLMs as an evolution engine to derive specialized heuristic policies. The framework introduces two key modules: (1) Structured State Abstraction (SSA), converting high-dimensional traffic data into temporal-logical facts for reasoning; and (2) Credit Assignment Feedback (CAF), tracing flawed micro-decisions to poor macro-outcomes for targeted critique. Operating entirely at the prompt level without training, \method{} yields lightweight, robust policies optimized for specific traffic environments, outperforming both heuristics and online LLM actors.

Evolutionary Discovery of Heuristic Policies for Traffic Signal Control

TL;DR

The paper tackles the trade-off between generality and specialization in traffic signal control by introducing TPET, an LLM-driven evolutionary framework that discovers specialized, lightweight heuristics without training. TPET uses Structured State Abstraction (SSA) to translate high-dimensional traffic data into temporal-logical facts and Credit Assignment Feedback (CAF) to provide defect-driven, post-hoc critiques, guiding iterative policy mutations. Experiments on CityFlow with multiple real-world datasets show TPET outperforms traditional heuristics and rival methods, while exhibiting superior stability compared to online LLM actors. The work offers a practical, interpretable alternative to fixed controls and opaque DRL/LLM approaches, with potential for rapid deployment in adaptive traffic environments.

Abstract

Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suffers from poor generalization and opaque policies. Online Large Language Models (LLMs) provide general reasoning but incur high latency and lack environment-specific optimization. To address these issues, we propose Temporal Policy Evolution for Traffic (\textbf{\method{}}), which uses LLMs as an evolution engine to derive specialized heuristic policies. The framework introduces two key modules: (1) Structured State Abstraction (SSA), converting high-dimensional traffic data into temporal-logical facts for reasoning; and (2) Credit Assignment Feedback (CAF), tracing flawed micro-decisions to poor macro-outcomes for targeted critique. Operating entirely at the prompt level without training, \method{} yields lightweight, robust policies optimized for specific traffic environments, outperforming both heuristics and online LLM actors.

Paper Structure

This paper contains 22 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Framework of TPET, depicting an LLM iteratively evolving heuristic traffic control policies by receiving actionable defect feedback from the Credit Assignment Feedback (CAF) module, which analyzes real-time traffic conditions abstracted by the Structured State Abstraction (SSA) module.
  • Figure 2: Details design of key modules.
  • Figure 3: Performance and robustness comparison on key metrics (ATT, AQL, AWT). Lower values are better for all metrics. Each point represents the mean value, and the error bar represents the standard deviation (±SD) across multiple runs. Note the Y-axis is zoomed in for each metric to highlight variance. TPET achieves top-tier performance with minimal variance, in stark contrast to the high instability (large error bars) of LLM-based actors.
  • Figure 4: Evolution of TPET for TSC. We outline the key outputs of the SSA and CAF modules. Moreover, we present the best algorithm in the final iteration and compare it with Maxpressure.