Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for Multi-Intersection Traffic Signal Control
Wenchang Duan, Zhenguo Gao, Jiwan He, Jinguo Xian
TL;DR
Addressing unreliable policies in RL-based ATSC, this work introduces BCT-APLight, which couples a Bayesian Critique-Tune framework with an Attention-Based Adaptive Pressure to refine multi-intersection signal control. The CT framework uses a prediction network to forecast $\hat{r}_{t+h}$ and a two-layer loop that constructs $CI_{Bayes}$ from history rewards via SARIMA priors and minimizes posterior risk $\mathcal{R}_{post}$ to adjust actions when needed. The AP mechanism provides a traffic-aware representation by weighting upstream-downstream lane interactions with attention, feeding an AP-based DQN that jointly optimizes phases with stable training. Empirically, BCT-APLight outperforms seven baselines across seven real-world datasets, achieving substantial reductions in average queue length $\Delta\text{AQL}$ and average waiting time $\Delta\text{AWT}$ and demonstrating robustness and scalability for large urban networks.
Abstract
Adaptive Traffic Signal Control (ATSC) system is a critical component of intelligent transportation, with the capability to significantly alleviate urban traffic congestion. Although reinforcement learning (RL)-based methods have demonstrated promising performance in achieving ATSC, existing methods are still prone to making unreasonable policies. Therefore, this paper proposes a novel Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for multi-intersection signal control (BCT-APLight). In BCT-APLight, the Critique-Tune (CT) framework, a two-layer Bayesian structure is designed to refine the excessive trust of RL policies. Specifically, the Bayesian inference-based Critique Layer provides effective evaluations of the credibility of policies; the Bayesian decision-based Tune Layer fine-tunes policies by minimizing the posterior risks when the evaluations are negative. Meanwhile, an attention-based Adaptive Pressure (AP) mechanism is designed to effectively weight the vehicle queues in each lane, thereby enhancing the rationality of traffic movement representation within the network. Equipped with the CT framework and AP mechanism, BCT-APLight effectively enhances the reasonableness of RL policies. Extensive experiments conducted with a simulator across a range of intersection layouts demonstrate that BCT-APLight is superior to other state-of-the-art (SOTA) methods on seven real-world datasets. Specifically, BCT-APLight decreases average queue length by \textbf{\(\boldsymbol{9.60\%}\)} and average waiting time by \textbf{\(\boldsymbol{15.28\%}\)}.
