Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

Hanna Krasowski; Matthias Althoff

Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

Hanna Krasowski, Matthias Althoff

TL;DR

This work tackles the challenge of enforcing traffic-rule compliance in reinforcement learning for autonomous vessels operating on the open sea. It introduces a provably safe RL framework that integrates temporal-logic formalizations of COLREGS into an online safety verifier, realized through a rule-compliant statechart and an emergency controller. Actions are restricted to a verified set $\mathcal{A}_s$ via action masking, ensuring that learned policies produce trajectories that satisfy the rulebook $\langle \Phi, \leq \rangle$. Experimental results demonstrate that the safe agent achieves zero collisions and zero rule violations in both handcrafted and recorded traffic scenarios, while baseline and rule-informed agents suffer from safety violations when verification is active, underscoring the practical impact of provable safety in maritime RL.

Abstract

For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.

Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

TL;DR

via action masking, ensuring that learned policies produce trajectories that satisfy the rulebook

. Experimental results demonstrate that the safe agent achieves zero collisions and zero rule violations in both handcrafted and recorded traffic scenarios, while baseline and rule-informed agents suffer from safety violations when verification is active, underscoring the practical impact of provable safety in maritime RL.

Abstract

Paper Structure (38 sections, 4 theorems, 30 equations, 8 figures, 4 tables, 3 algorithms)

This paper contains 38 sections, 4 theorems, 30 equations, 8 figures, 4 tables, 3 algorithms.

Introduction
Related Work
Safety specification for maritime motion planning
Motion planning for vessels
Provably safe RL
Preliminaries and Problem Statement
Notation and dynamics
Set-based prediction of vessels
Problem statement
Specification
Traffic Rulebook
Emergency Rule Predicates
Specification-compliant Statechart
Rule-Compliant Maneuver Synthesis
Emergency Maneuver
...and 23 more sections

Key Result

Proposition 1

For the states $\rho_i, \; \forall i \in \{1,...,4\}$, the predicate $\mathrm{collision\_possible}$ is true.

Figures (8)

Figure 1: Proposed provably safe rl approach for autonomous vessels. First, traffic rules for collision avoidance are formalized with temporal logic (see \ref{['ch:prelim']}). Based on the formal specification, the set of rule-compliant actions is identified (see \ref{['ch:online_verification']} and \ref{['ch:maneuver_synthesis']}), which are integrated in the rl process so that the agent can only select actions that are rule-compliant (see \ref{['ch:reinforcement_learning']}). Note that the statechart in \ref{['fig:statechart']} details the computation of verified rue-compliant actions and comprises two modes: normal operation and emergency operation. The resulting safe agent achieves rule compliance and collision avoidance during training and deployment, while agents without the safety verification of actions violate the formalized traffic rules and collide still after training (see \ref{['ch:experiments']}).
Figure 2: Encounter situations and rule-compliant maneuvers specified in the colregs (adapted from Krasowski.2021).
Figure 3: Statechart $\Gamma$ modeling the legal safety specification with predicates at the transitions. The states for the regular collision avoidance rules $R_3$ - $R_6$ are depicted in blue and the emergency operation state for rule $R_1$ in red. For safety verification of actions, the algorithms identifying the set of rule-compliant actions (indicated in brackets) are employed given the current state $\rho_i$ of the statechart.
Figure 4: Emergency controller modes with set-based occupancy prediction of obstacle vessel in orange and the occupancy of the ego vessel in blue for several time intervals. The orientation of the ego vessel and the obstacle vessel are indicated with dashed lines and emergency maneuver is depicted by green arrows or occupancies. The green cross indicates the target position for the base and ahead modes. The sectors, for which the predicate $\mathrm{in\_sector}$ is true, are shown in gray for the ahead and stern mode. The visualization of the sectors includes the arguments of predicate $\mathrm{in\_sector}$ in dark blue and the point of origin in black.
Figure 5: Visualization of turning direction cases. The obstacle vessel is depicted in orange and the ego vessel in blue. Arrows indicate orientations and positions are marked with dots. The turning direction case is indicated by the superscript. For cases 1 and 3, the ego vessel should turn right and for the cases 2 and 4, the ego vessel should turn left.
...and 3 more figures

Theorems & Definitions (5)

Definition 1: Rules $\Phi$
Proposition 1
Lemma 1
Theorem 1
Theorem 2

Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

TL;DR

Abstract

Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)