Convergence of Multiagent Learning Systems for Traffic control

Sayambhu Sen; Shalabh Bhatnagar

Convergence of Multiagent Learning Systems for Traffic control

Sayambhu Sen, Shalabh Bhatnagar

TL;DR

This paper develops a theoretical foundation for convergence of independent multi-agent Q-learning in traffic signal control (TSC). By modeling TSC as a decentralized MDP with per-junction Q-learning and discretized queue-based states, the authors formulate a stochastic approximation framework that links learning dynamics to a mean-field ODE $\dot{\bar{Q}}(t) = \bar{h}(\bar{Q}(t))$. They prove convergence for both value-iteration-based and asynchronous multi-agent Q-learning via contraction properties (with $\beta < 1$) and Lipschitz drift, leveraging cooperative ODE results and standard stochastic approximation assumptions. A concrete 3-junction example illustrates the interactions among neighboring junctions through turning probabilities and a neighborhood cost function, while the general model extends to arbitrary networks with external and internal lanes and a generalized cost. Overall, the work provides a rigorous theoretical guarantee that independent MARL strategies for TSC can converge under explicit conditions, strengthening the foundation for deploying MARL-based traffic controllers in real networks, with implications for stability and performance guarantees.

Abstract

Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often modeling each traffic signal as an independent agent using Q-learning, has emerged as a promising strategy to reduce average commuter delays. While prior work Prashant L A et. al has empirically demonstrated the effectiveness of this approach, a rigorous theoretical analysis of its stability and convergence properties in the context of traffic control has not been explored. This paper bridges that gap by focusing squarely on the theoretical basis of this multi-agent algorithm. We investigate the convergence problem inherent in using independent learners for the cooperative TSC task. Utilizing stochastic approximation methods, we formally analyze the learning dynamics. The primary contribution of this work is the proof that the specific multi-agent reinforcement learning algorithm for traffic control is proven to converge under the given conditions extending it from single agent convergence proofs for asynchronous value iteration.

Convergence of Multiagent Learning Systems for Traffic control

TL;DR

Abstract

Convergence of Multiagent Learning Systems for Traffic control

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)