Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters

Aksel Vaaler; Svein Jostein Husa; Daniel Menges; Thomas Nakken Larsen; Adil Rasheed

Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters

Aksel Vaaler, Svein Jostein Husa, Daniel Menges, Thomas Nakken Larsen, Adil Rasheed

TL;DR

The paper tackles safety in learning-based autonomous marine navigation by integrating a modular Predictive Safety Filter (PSF) with model-free reinforcement learning to enforce physical and collision constraints during training and operation. It builds on a 3-DOF Cybership II model, a fuzzy collision-risk metric, a disturbance observer, and an MPC-like PSF that optimally adjusts proposed actions within a safe terminal set, achieving real-time feasibility with solver runtimes under 10 ms. Key contributions include the design and verification of the PSF for marine collision avoidance, demonstration of improved safety and learning efficiency across randomized scenarios, and successful transfer to real-environment-like tests. The results show that the PSF can dramatically reduce collisions during learning while preserving or even accelerating policy convergence, making RL-based marine navigation more practical and safe for real deployments.

Abstract

Many autonomous systems face safety challenges, requiring robust closed-loop control to handle physical limitations and safety constraints. Real-world systems, like autonomous ships, encounter nonlinear dynamics and environmental disturbances. Reinforcement learning is increasingly used to adapt to complex scenarios, but standard frameworks ensuring safety and stability are lacking. Predictive Safety Filters (PSF) offer a promising solution, ensuring constraint satisfaction in learning-based control without explicit constraint handling. This modular approach allows using arbitrary control policies, with the safety filter optimizing proposed actions to meet physical and safety constraints. We apply this approach to marine navigation, combining RL with PSF on a simulated Cybership II model. The RL agent is trained on path following and collision avpodance, while the PSF monitors and modifies control actions for safety. Results demonstrate the PSF's effectiveness in maintaining safety without hindering the RL agent's learning rate and performance, evaluated against a standard RL agent without PSF.

Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters

TL;DR

Abstract

Paper Structure (51 sections, 56 equations, 15 figures, 7 tables)

This paper contains 51 sections, 56 equations, 15 figures, 7 tables.

Introduction
Theory
Ship modeling
Kinematics
dynamics
Collision risk
Deep reinforcement learning
RL preliminaries
Policy-based and value-based Methods
Proximal policy optimization
Environmental disturbance observer
Predictive Safety Filter
Formulation of PSF OCP for the 3-DOF vessel model
Control invariant terminal set formulation
Method and setup
...and 36 more sections

Figures (15)

Figure 1: Illustration of how the predictive safety filter modifies the nominal trajectory based on a safe set $\mathbb{X}$, terminal set $\mathbb{X}_f$, and number of shooting nodes $N$. The yellow path indicated on the left figure lies closer to the nominal unsafe path (red), but given the short prediction horizon, the PSF must take the shorter (dark green) path directly towards the terminal set. In the righthand figure, with 1 additional shooting node, the PSF can compute a trajectory that lies closer to the nominal path, and still be able to reach the terminal set "in time"
Figure 2: Visualization of ship trajectory modification caused by terminal safety constraint, with $N=1$ for clarity. Red arrows indicate nominal (unsafe) trajectory, while green arrows indicate PSF modified trajectory
Figure 3: Illustration of the RL + PSF control design. Note that the LiDAR perception features, environmental disturbances, and obstacles have been omitted from this figure for clarity.
Figure 4: RL agent diagram. The observation vector contains both LiDAR and navigation features. While the navigation features are used directly in the PPO algorithm, the LiDAR measurements are processed through a CNN. The PPO outputs an action $u_{\mathcal{L}}$ that is sent through the safety filter. Finally, the safe action $u_{0}$ can be executed in the environment.
Figure 5: Schematic of how the LiDAR detection works
...and 10 more figures

Theorems & Definitions (1)

Definition 1

Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters

TL;DR

Abstract

Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (1)