Table of Contents
Fetching ...

MTLight: Efficient Multi-Task Reinforcement Learning for Traffic Signal Control

Liwen Zhu, Peixi Peng, Zongqing Lu, Yonghong Tian

TL;DR

MTLight tackles the challenge of observation richness and sample efficiency in multi-agent traffic signal control by learning a hierarchical latent state via a Multi-Task network. The latent state comprises a task-shared representation and task-specific representations derived from multiple auxiliary traffic predictions, which condition the policy for improved decision making. Empirical results on CityFlow across several cities and traffic patterns show faster convergence, better asymptotic performance, and robust adaptability to peak-hour flows, outperforming both conventional methods and existing RL baselines. The work highlights the value of leveraging related auxiliary tasks as priors to guide learning in complex, interconnected traffic networks, with potential extensions through imitation learning and pretraining of the latent module.

Abstract

Traffic signal control has a great impact on alleviating traffic congestion in modern cities. Deep reinforcement learning (RL) has been widely used for this task in recent years, demonstrating promising performance but also facing many challenges such as limited performances and sample inefficiency. To handle these challenges, MTLight is proposed to enhance the agent observation with a latent state, which is learned from numerous traffic indicators. Meanwhile, multiple auxiliary and supervisory tasks are constructed to learn the latent state, and two types of embedding latent features, the task-specific feature and task-shared feature, are used to make the latent state more abundant. Extensive experiments conducted on CityFlow demonstrate that MTLight has leading convergence speed and asymptotic performance. We further simulate under peak-hour pattern in all scenarios with increasing control difficulty and the results indicate that MTLight is highly adaptable.

MTLight: Efficient Multi-Task Reinforcement Learning for Traffic Signal Control

TL;DR

MTLight tackles the challenge of observation richness and sample efficiency in multi-agent traffic signal control by learning a hierarchical latent state via a Multi-Task network. The latent state comprises a task-shared representation and task-specific representations derived from multiple auxiliary traffic predictions, which condition the policy for improved decision making. Empirical results on CityFlow across several cities and traffic patterns show faster convergence, better asymptotic performance, and robust adaptability to peak-hour flows, outperforming both conventional methods and existing RL baselines. The work highlights the value of leveraging related auxiliary tasks as priors to guide learning in complex, interconnected traffic networks, with potential extensions through imitation learning and pretraining of the latent module.

Abstract

Traffic signal control has a great impact on alleviating traffic congestion in modern cities. Deep reinforcement learning (RL) has been widely used for this task in recent years, demonstrating promising performance but also facing many challenges such as limited performances and sample inefficiency. To handle these challenges, MTLight is proposed to enhance the agent observation with a latent state, which is learned from numerous traffic indicators. Meanwhile, multiple auxiliary and supervisory tasks are constructed to learn the latent state, and two types of embedding latent features, the task-specific feature and task-shared feature, are used to make the latent state more abundant. Extensive experiments conducted on CityFlow demonstrate that MTLight has leading convergence speed and asymptotic performance. We further simulate under peak-hour pattern in all scenarios with increasing control difficulty and the results indicate that MTLight is highly adaptable.
Paper Structure (25 sections, 3 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 3 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Multi-Task module forms task-shared and task-specific latent states to enhance the agent observation.
  • Figure 2: MTLight consists of a multi-task network and a policy network. RL agent is augmented with a task-shared latent state $\mathrm{\mathbf{o}_{t}^{shr}}$ and a task-specific latent state $\mathrm{\mathbf{o}_{t}^{spe}}$.
  • Figure 3: Illustration of strategies for all RL methods under Real configuration in Hangzhou.
  • Figure 4: Performance of RL methods under real configurations.
  • Figure 5: Performance of RL methods under synthetic peak configurations.
  • ...and 3 more figures