Table of Contents
Fetching ...

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

Zicong Zhao

TL;DR

This work confronts the safety guarantees gap in applying deep reinforcement learning to urban rail autonomous operation. It proposes SSA-DRL, a framework that combines a post-posed LTL-based Shield, a Safe Action Searching Tree, a DRL core, and an Additional actor to produce safe, schedulable, and energy-efficient control commands. The approach delivers stronger safety guarantees, reduced reliance on protective overrides, and enhanced robustness and transferability across sections, with theoretical and empirical support from simulations and ablation studies. Practically, SSA-DRL offers a scalable path toward reliable, explainable autonomous rail operation that respects safety, timing, and energy objectives in real-world transit networks.

Abstract

Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

TL;DR

This work confronts the safety guarantees gap in applying deep reinforcement learning to urban rail autonomous operation. It proposes SSA-DRL, a framework that combines a post-posed LTL-based Shield, a Safe Action Searching Tree, a DRL core, and an Additional actor to produce safe, schedulable, and energy-efficient control commands. The approach delivers stronger safety guarantees, reduced reliance on protective overrides, and enhanced robustness and transferability across sections, with theoretical and empirical support from simulations and ablation studies. Practically, SSA-DRL offers a scalable path toward reliable, explainable autonomous rail operation that respects safety, timing, and energy objectives in real-world transit networks.

Abstract

Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
Paper Structure (28 sections, 1 theorem, 27 equations, 11 figures, 6 tables, 2 algorithms)

This paper contains 28 sections, 1 theorem, 27 equations, 11 figures, 6 tables, 2 algorithms.

Key Result

Lemma 1

For policies to get a safe action, $\pi_{sa}$ to get $a_{sa}$ is better than any other policies to get another $\pi_{sa}^{'}$. Moreover, $\pi_{sa}$ is no less than the original policy $\mu^{\theta}$ to get a safe action.

Figures (11)

  • Figure 1: Framework of SSA-DRL.
  • Figure 2: Structure of post-posed Shield.
  • Figure 3: Framework of safe action searching tree.
  • Figure 4: Traction and braking characteristic.
  • Figure 5: Speed profiles in different sections.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof