Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters
Aksel Vaaler, Svein Jostein Husa, Daniel Menges, Thomas Nakken Larsen, Adil Rasheed
TL;DR
The paper tackles safety in learning-based autonomous marine navigation by integrating a modular Predictive Safety Filter (PSF) with model-free reinforcement learning to enforce physical and collision constraints during training and operation. It builds on a 3-DOF Cybership II model, a fuzzy collision-risk metric, a disturbance observer, and an MPC-like PSF that optimally adjusts proposed actions within a safe terminal set, achieving real-time feasibility with solver runtimes under 10 ms. Key contributions include the design and verification of the PSF for marine collision avoidance, demonstration of improved safety and learning efficiency across randomized scenarios, and successful transfer to real-environment-like tests. The results show that the PSF can dramatically reduce collisions during learning while preserving or even accelerating policy convergence, making RL-based marine navigation more practical and safe for real deployments.
Abstract
Many autonomous systems face safety challenges, requiring robust closed-loop control to handle physical limitations and safety constraints. Real-world systems, like autonomous ships, encounter nonlinear dynamics and environmental disturbances. Reinforcement learning is increasingly used to adapt to complex scenarios, but standard frameworks ensuring safety and stability are lacking. Predictive Safety Filters (PSF) offer a promising solution, ensuring constraint satisfaction in learning-based control without explicit constraint handling. This modular approach allows using arbitrary control policies, with the safety filter optimizing proposed actions to meet physical and safety constraints. We apply this approach to marine navigation, combining RL with PSF on a simulated Cybership II model. The RL agent is trained on path following and collision avpodance, while the PSF monitors and modifies control actions for safety. Results demonstrate the PSF's effectiveness in maintaining safety without hindering the RL agent's learning rate and performance, evaluated against a standard RL agent without PSF.
