How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation
Zicong Zhao
TL;DR
This work confronts the safety guarantees gap in applying deep reinforcement learning to urban rail autonomous operation. It proposes SSA-DRL, a framework that combines a post-posed LTL-based Shield, a Safe Action Searching Tree, a DRL core, and an Additional actor to produce safe, schedulable, and energy-efficient control commands. The approach delivers stronger safety guarantees, reduced reliance on protective overrides, and enhanced robustness and transferability across sections, with theoretical and empirical support from simulations and ablation studies. Practically, SSA-DRL offers a scalable path toward reliable, explainable autonomous rail operation that respects safety, timing, and energy objectives in real-world transit networks.
Abstract
Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
