How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

Zicong Zhao

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

Zicong Zhao

TL;DR

This work confronts the safety guarantees gap in applying deep reinforcement learning to urban rail autonomous operation. It proposes SSA-DRL, a framework that combines a post-posed LTL-based Shield, a Safe Action Searching Tree, a DRL core, and an Additional actor to produce safe, schedulable, and energy-efficient control commands. The approach delivers stronger safety guarantees, reduced reliance on protective overrides, and enhanced robustness and transferability across sections, with theoretical and empirical support from simulations and ablation studies. Practically, SSA-DRL offers a scalable path toward reliable, explainable autonomous rail operation that respects safety, timing, and energy objectives in real-world transit networks.

Abstract

Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

TL;DR

Abstract

Paper Structure (28 sections, 1 theorem, 27 equations, 11 figures, 6 tables, 2 algorithms)

This paper contains 28 sections, 1 theorem, 27 equations, 11 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Problems and Contributions
Preliminaries
Markov Decision Process
State Value and Action Value
Off Policy DRL
Monte Carlo Tree Search
Linear Temporal Logic
Method Formulation
A Post-Posed Shield
Safe Action Searching Tree
DRL based guiding learner
Optimality and Convergence Analysis
Optimality
...and 13 more sections

Key Result

Lemma 1

For policies to get a safe action, $\pi_{sa}$ to get $a_{sa}$ is better than any other policies to get another $\pi_{sa}^{'}$. Moreover, $\pi_{sa}$ is no less than the original policy $\mu^{\theta}$ to get a safe action.

Figures (11)

Figure 1: Framework of SSA-DRL.
Figure 2: Structure of post-posed Shield.
Figure 3: Framework of safe action searching tree.
Figure 4: Traction and braking characteristic.
Figure 5: Speed profiles in different sections.
...and 6 more figures

Theorems & Definitions (2)

Lemma 1
proof

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

TL;DR

Abstract

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)