Table of Contents
Fetching ...

Toward Dependency Dynamics in Multi-Agent Reinforcement Learning for Traffic Signal Control

Yuli Zhang, Shangbo Wang, Dongyao Jia, Pengfei Fan, Ruiyuan Jiang, Hankang Gu, Andy H. F. Chow

TL;DR

This work addresses the stability and scalability challenges of multi-agent reinforcement learning for traffic signal control by introducing dependency dynamics caused by spill-back congestion. It proves that when spill-back is absent, independent reinforcement learning can match centralized Q-learning, and it then introduces DQN-DPUS, a dynamic parameter update strategy that updates either all network weights or only diagonal blocks depending on the current dependency state. The approach blends centralized and distributed learning advantages, achieving faster convergence and robust performance in congested regimes, as demonstrated on a two-intersection SUMO scenario. These results offer a practical framework for adaptive, scalable traffic signal control that adapts to changing inter-intersection dependencies in urban networks.

Abstract

Reinforcement learning (RL) emerges as a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, with deep neural networks substantially augmenting its learning capabilities. However, centralized RL becomes impractical for ATSC involving multiple agents due to the exceedingly high dimensionality of the joint action space. Multi-agent RL (MARL) mitigates this scalability issue by decentralizing control to local RL agents. Nevertheless, this decentralized method introduces new challenges: the environment becomes partially observable from the perspective of each local agent due to constrained inter-agent communication. Both centralized RL and MARL exhibit distinct strengths and weaknesses, particularly under heavy intersectional traffic conditions. In this paper, we justify that MARL can achieve the optimal global Q-value by separating into multiple IRL (Independent Reinforcement Learning) processes when no spill-back congestion occurs (no agent dependency) among agents (intersections). In the presence of spill-back congestion (with agent dependency), the maximum global Q-value can be achieved by using centralized RL. Building upon the conclusions, we propose a novel Dynamic Parameter Update Strategy for Deep Q-Network (DQN-DPUS), which updates the weights and bias based on the dependency dynamics among agents, i.e. updating only the diagonal sub-matrices for the scenario without spill-back congestion. We validate the DQN-DPUS in a simple network with two intersections under varying traffic, and show that the proposed strategy can speed up the convergence rate without sacrificing optimal exploration. The results corroborate our theoretical findings, demonstrating the efficacy of DQN-DPUS in optimizing traffic signal control.

Toward Dependency Dynamics in Multi-Agent Reinforcement Learning for Traffic Signal Control

TL;DR

This work addresses the stability and scalability challenges of multi-agent reinforcement learning for traffic signal control by introducing dependency dynamics caused by spill-back congestion. It proves that when spill-back is absent, independent reinforcement learning can match centralized Q-learning, and it then introduces DQN-DPUS, a dynamic parameter update strategy that updates either all network weights or only diagonal blocks depending on the current dependency state. The approach blends centralized and distributed learning advantages, achieving faster convergence and robust performance in congested regimes, as demonstrated on a two-intersection SUMO scenario. These results offer a practical framework for adaptive, scalable traffic signal control that adapts to changing inter-intersection dependencies in urban networks.

Abstract

Reinforcement learning (RL) emerges as a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, with deep neural networks substantially augmenting its learning capabilities. However, centralized RL becomes impractical for ATSC involving multiple agents due to the exceedingly high dimensionality of the joint action space. Multi-agent RL (MARL) mitigates this scalability issue by decentralizing control to local RL agents. Nevertheless, this decentralized method introduces new challenges: the environment becomes partially observable from the perspective of each local agent due to constrained inter-agent communication. Both centralized RL and MARL exhibit distinct strengths and weaknesses, particularly under heavy intersectional traffic conditions. In this paper, we justify that MARL can achieve the optimal global Q-value by separating into multiple IRL (Independent Reinforcement Learning) processes when no spill-back congestion occurs (no agent dependency) among agents (intersections). In the presence of spill-back congestion (with agent dependency), the maximum global Q-value can be achieved by using centralized RL. Building upon the conclusions, we propose a novel Dynamic Parameter Update Strategy for Deep Q-Network (DQN-DPUS), which updates the weights and bias based on the dependency dynamics among agents, i.e. updating only the diagonal sub-matrices for the scenario without spill-back congestion. We validate the DQN-DPUS in a simple network with two intersections under varying traffic, and show that the proposed strategy can speed up the convergence rate without sacrificing optimal exploration. The results corroborate our theoretical findings, demonstrating the efficacy of DQN-DPUS in optimizing traffic signal control.

Paper Structure

This paper contains 21 sections, 52 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of two scenarios for traffic management: with and without spill-back. In the spill-back scenario, centralized learning is preferred, while distributed learning is more effective in the no-spill-back scenario. Centralized learning coordinates actions among agents to manage congestion, whereas distributed learning allows agents to act independently, improving efficiency without congestion. The choice of algorithm is crucial in different traffic conditions.
  • Figure 2: Spill-back process in ATSC.
  • Figure 3: Overall architecture of the approach.
  • Figure 4: presents the traffic data stream we used (one hour), showing an increase in vehicle input from top to bottom. As the vehicle input increases, the degree of intersection spill-back also becomes more significant.
  • Figure 5: Performance of four algorithms under different spill-over rates.