Table of Contents
Fetching ...

2-Level Reinforcement Learning for Ships on Inland Waterways: Path Planning and Following

Martin Waltz, Niklas Paulig, Ostap Okhrin

TL;DR

A novel application of a spatial-temporal recurrent neural network architecture to continuous action spaces to controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL).

Abstract

This paper proposes a realistic modularized framework for controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework improves operational safety and comprises two levels: a high-level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of dynamic vessels, closing a gap in the current research landscape. In addition, the LPP agent adequately considers traffic rules and the geometry of the waterway. We thereby introduce a novel application of a spatial-temporal recurrent neural network architecture to continuous action spaces. The LPP agent outperforms a state-of-the-art artificial potential field (APF) method by increasing the minimum distance to other vessels by 65% on average. The PF agent performs low-level actuator control while accounting for shallow water influences and the environmental forces winds, waves, and currents. Compared with a proportional-integral-derivative (PID) controller, the PF agent yields only 61% of the mean cross-track error (MCTE) while significantly reducing control effort (CE) in terms of the required absolute rudder angle. Lastly, both agents are jointly validated in simulation, employing the lower Elbe in northern Germany as an example case and using real automatic identification system (AIS) trajectories to model the behavior of other ships.

2-Level Reinforcement Learning for Ships on Inland Waterways: Path Planning and Following

TL;DR

A novel application of a spatial-temporal recurrent neural network architecture to continuous action spaces to controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL).

Abstract

This paper proposes a realistic modularized framework for controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework improves operational safety and comprises two levels: a high-level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of dynamic vessels, closing a gap in the current research landscape. In addition, the LPP agent adequately considers traffic rules and the geometry of the waterway. We thereby introduce a novel application of a spatial-temporal recurrent neural network architecture to continuous action spaces. The LPP agent outperforms a state-of-the-art artificial potential field (APF) method by increasing the minimum distance to other vessels by 65% on average. The PF agent performs low-level actuator control while accounting for shallow water influences and the environmental forces winds, waves, and currents. Compared with a proportional-integral-derivative (PID) controller, the PF agent yields only 61% of the mean cross-track error (MCTE) while significantly reducing control effort (CE) in terms of the required absolute rudder angle. Lastly, both agents are jointly validated in simulation, employing the lower Elbe in northern Germany as an example case and using real automatic identification system (AIS) trajectories to model the behavior of other ships.
Paper Structure (48 sections, 35 equations, 23 figures, 4 tables, 2 algorithms)

This paper contains 48 sections, 35 equations, 23 figures, 4 tables, 2 algorithms.

Figures (23)

  • Figure 1: The proposed architecture for an ASV based on DRL, with the visualization being inspired by chen2016path.
  • Figure 1: Visualization of the forces of the APF method building on liu2023colregs and wang2019obstacle.
  • Figure 1: Trajectories of the validation scenarios of the LPP agent on a right curve. Note that the latitude and longitude values are artificial and serve as orientation.
  • Figure 1: Interpolation between two AIS messages of two ships, to receive information at the query time $t_q$; inspired by rong2022ship.
  • Figure 2: The simplification of the LPP procedure if no target ships are present (left), and an overtaking maneuver (right).
  • ...and 18 more figures