Table of Contents
Fetching ...

Falsification-Driven Reinforcement Learning for Maritime Motion Planning

Marlon Müller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff

TL;DR

This work tackles the challenge of training autonomous vessels to consistently comply with COLREGs in open-sea navigation. It introduces a falsification-driven reinforcement-learning framework that generates adversarial, rule-violating scenarios expressed as signal temporal logic specifications to strengthen training. Key contributions include a practical CMA-ES-based falsification algorithm, an extended robustness framework for maritime STLs, and empirical evidence showing improved and more consistent rule compliance (including nonvacuous encounters) over a baselineRL approach. The approach has potential to substantially boost safety in autonomous maritime systems by producing more relevant training scenarios, though real-world validation and multi-agent scalability remain important future steps.

Abstract

Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.

Falsification-Driven Reinforcement Learning for Maritime Motion Planning

TL;DR

This work tackles the challenge of training autonomous vessels to consistently comply with COLREGs in open-sea navigation. It introduces a falsification-driven reinforcement-learning framework that generates adversarial, rule-violating scenarios expressed as signal temporal logic specifications to strengthen training. Key contributions include a practical CMA-ES-based falsification algorithm, an extended robustness framework for maritime STLs, and empirical evidence showing improved and more consistent rule compliance (including nonvacuous encounters) over a baselineRL approach. The approach has potential to substantially boost safety in autonomous maritime systems by producing more relevant training scenarios, though real-world validation and multi-agent scalability remain important future steps.

Abstract

Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.

Paper Structure

This paper contains 21 sections, 25 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Vessel encounters and rule-compliant maneuvers.
  • Figure 2: Overview of the proposed framework. Every $f_{\mathrm{falsification}}$ training steps, the falsification process identifies scenarios where the agent policy $\pi^{\mathrm{E}}$ fails to comply with the specification $\varphi$. These scenarios are used for training the agent with standard reinforcement learning: the reward is $r_k^{\mathrm{E}}$, the observation is $o_k^{\mathrm{E}}$, and the agent action is $u_k^{\mathrm{E}}$.
  • Figure 3: Illustration of the robustness measures (a) $h_{\texttt{p}}\coloneq h_{\texttt{position\_halfplane}}$ and (b) $h_{\texttt{o}}\coloneq h_{\texttt{orientation\_halfplane}}$ with $\beta,\gamma \coloneq \pi/4$.
  • Figure 4: Distribution of reward.
  • Figure 5: Distribution of encounter types.
  • ...and 1 more figures