Towards Real-World Deployment of Reinforcement Learning for Traffic Signal Control
Arthur Müller, Vishal Rangras, Georg Schnittker, Michael Waldmann, Maxim Friesen, Tobias Ferfers, Lukas Schreckenberg, Florian Hufen, Jürgen Jasperneite, Marco Wiering
TL;DR
The paper tackles the gap between RL-based traffic signal control research and real-world deployment by introducing LemgoRL, a high-fidelity, open-source benchmark built on SUMO for the OWL322 intersection, including a Traffic Signal Logic Unit to enforce safety and regulations. It demonstrates a practical learning pipeline using PPO and RLlib on CPU clusters, with a realistic traffic demand model and a Gym-compatible interface to encourage broad adoption. Key contributions include a realistic simulation model, an explicitly engineered MDP specification with detailed state, action, and reward design, and realistic phase transitions that support real-world transfer. The work advances the practical viability of RL-based TSC by showing improved performance over rule-based baselines and outlining a concrete path toward real-world deployment at OWL322 and similar intersections.
Abstract
Sub-optimal control policies in intersection traffic signal controllers (TSC) contribute to congestion and lead to negative effects on human health and the environment. Reinforcement learning (RL) for traffic signal control is a promising approach to design better control policies and has attracted considerable research interest in recent years. However, most work done in this area used simplified simulation environments of traffic scenarios to train RL-based TSC. To deploy RL in real-world traffic systems, the gap between simplified simulation environments and real-world applications has to be closed. Therefore, we propose LemgoRL, a benchmark tool to train RL agents as TSC in a realistic simulation environment of Lemgo, a medium-sized town in Germany. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the wellknown OpenAI gym toolkit to enable easy deployment in existing research work. To demonstrate the functionality and applicability of LemgoRL, we train a state-of-the-art Deep RL algorithm on a CPU cluster utilizing a framework for distributed and parallel RL and compare its performance with other methods. Our benchmark tool drives the development of RL algorithms towards real-world applications.
