Table of Contents
Fetching ...

Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for Multi-Intersection Traffic Signal Control

Wenchang Duan, Zhenguo Gao, Jiwan He, Jinguo Xian

TL;DR

Addressing unreliable policies in RL-based ATSC, this work introduces BCT-APLight, which couples a Bayesian Critique-Tune framework with an Attention-Based Adaptive Pressure to refine multi-intersection signal control. The CT framework uses a prediction network to forecast $\hat{r}_{t+h}$ and a two-layer loop that constructs $CI_{Bayes}$ from history rewards via SARIMA priors and minimizes posterior risk $\mathcal{R}_{post}$ to adjust actions when needed. The AP mechanism provides a traffic-aware representation by weighting upstream-downstream lane interactions with attention, feeding an AP-based DQN that jointly optimizes phases with stable training. Empirically, BCT-APLight outperforms seven baselines across seven real-world datasets, achieving substantial reductions in average queue length $\Delta\text{AQL}$ and average waiting time $\Delta\text{AWT}$ and demonstrating robustness and scalability for large urban networks.

Abstract

Adaptive Traffic Signal Control (ATSC) system is a critical component of intelligent transportation, with the capability to significantly alleviate urban traffic congestion. Although reinforcement learning (RL)-based methods have demonstrated promising performance in achieving ATSC, existing methods are still prone to making unreasonable policies. Therefore, this paper proposes a novel Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for multi-intersection signal control (BCT-APLight). In BCT-APLight, the Critique-Tune (CT) framework, a two-layer Bayesian structure is designed to refine the excessive trust of RL policies. Specifically, the Bayesian inference-based Critique Layer provides effective evaluations of the credibility of policies; the Bayesian decision-based Tune Layer fine-tunes policies by minimizing the posterior risks when the evaluations are negative. Meanwhile, an attention-based Adaptive Pressure (AP) mechanism is designed to effectively weight the vehicle queues in each lane, thereby enhancing the rationality of traffic movement representation within the network. Equipped with the CT framework and AP mechanism, BCT-APLight effectively enhances the reasonableness of RL policies. Extensive experiments conducted with a simulator across a range of intersection layouts demonstrate that BCT-APLight is superior to other state-of-the-art (SOTA) methods on seven real-world datasets. Specifically, BCT-APLight decreases average queue length by \textbf{\(\boldsymbol{9.60\%}\)} and average waiting time by \textbf{\(\boldsymbol{15.28\%}\)}.

Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for Multi-Intersection Traffic Signal Control

TL;DR

Addressing unreliable policies in RL-based ATSC, this work introduces BCT-APLight, which couples a Bayesian Critique-Tune framework with an Attention-Based Adaptive Pressure to refine multi-intersection signal control. The CT framework uses a prediction network to forecast and a two-layer loop that constructs from history rewards via SARIMA priors and minimizes posterior risk to adjust actions when needed. The AP mechanism provides a traffic-aware representation by weighting upstream-downstream lane interactions with attention, feeding an AP-based DQN that jointly optimizes phases with stable training. Empirically, BCT-APLight outperforms seven baselines across seven real-world datasets, achieving substantial reductions in average queue length and average waiting time and demonstrating robustness and scalability for large urban networks.

Abstract

Adaptive Traffic Signal Control (ATSC) system is a critical component of intelligent transportation, with the capability to significantly alleviate urban traffic congestion. Although reinforcement learning (RL)-based methods have demonstrated promising performance in achieving ATSC, existing methods are still prone to making unreasonable policies. Therefore, this paper proposes a novel Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for multi-intersection signal control (BCT-APLight). In BCT-APLight, the Critique-Tune (CT) framework, a two-layer Bayesian structure is designed to refine the excessive trust of RL policies. Specifically, the Bayesian inference-based Critique Layer provides effective evaluations of the credibility of policies; the Bayesian decision-based Tune Layer fine-tunes policies by minimizing the posterior risks when the evaluations are negative. Meanwhile, an attention-based Adaptive Pressure (AP) mechanism is designed to effectively weight the vehicle queues in each lane, thereby enhancing the rationality of traffic movement representation within the network. Equipped with the CT framework and AP mechanism, BCT-APLight effectively enhances the reasonableness of RL policies. Extensive experiments conducted with a simulator across a range of intersection layouts demonstrate that BCT-APLight is superior to other state-of-the-art (SOTA) methods on seven real-world datasets. Specifically, BCT-APLight decreases average queue length by \textbf{} and average waiting time by \textbf{}.

Paper Structure

This paper contains 32 sections, 41 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Architecture of BCT-APLight. The urban traffic environment for ATSC (a) offers complex traffic dynamics and an interactive framework for reinforcement learning (RL). Attention-based Adaptive Pressure (b) enables RL agents to effectively capture traffic features. This adaptive pressure, combined with traffic lane details, enhances DQN-based traffic signal control (c). The Bayesian-based Critique-Tune framework (d) evaluates and refines RL policies for improved decision-making.
  • Figure 2: The traditional efficient pressure and the attention-based adaptive pressure are provided in (a). The eight traffic signals are illustrated in (b).
  • Figure 3: Framework of Bayesian Critique-Tune for RL.
  • Figure 4: Architecture of attention-based adaptive pressure extraction for each intersection direction.
  • Figure 5: The road network systems of the datasets from Jinan, Hangzhou, and New York, with uniform dimensions. The blue dots mark the traffic signal lights controlled by RL-agent.
  • ...and 3 more figures