Table of Contents
Fetching ...

Reinforcement Learning Ship Autopilot: Sample efficient and Model Predictive Control-based Approach

Yunduan Cui, Shigeki Osaki, Takamitsu Matsubara

TL;DR

This work tackles autonomous boat autopilot under strong ocean disturbances with limited real-world data. It introduces SPMPC, a sample-efficient, probabilistic model-based RL framework that pairs Gaussian process dynamics learning with model predictive control, using a modified moment-matching approach for efficient long-horizon optimization within an MPC loop. The method is validated through simulation and real-boat experiments, showing robust performance, effective handling of wind/current disturbances, and substantial data efficiency (thousands of samples). The results suggest SPMPC as a practical, scalable approach for real-world autonomous marine navigation and potentially other high-disturbance robotics domains.

Abstract

In this research we focus on developing a reinforcement learning system for a challenging task: autonomous control of a real-sized boat, with difficulties arising from large uncertainties in the challenging ocean environment and the extremely high cost of exploring and sampling with a real boat. To this end, we explore a novel Gaussian processes (GP) based reinforcement learning approach that combines sample-efficient model-based reinforcement learning and model predictive control (MPC). Our approach, sample-efficient probabilistic model predictive control (SPMPC), iteratively learns a Gaussian process dynamics model and uses it to efficiently update control signals within the MPC closed control loop. A system using SPMPC is built to efficiently learn an autopilot task. After investigating its performance in a simulation modeled upon real boat driving data, the proposed system successfully learns to drive a real-sized boat equipped with a single engine and sensors measuring GPS, speed, direction, and wind in an autopilot task without human demonstration.

Reinforcement Learning Ship Autopilot: Sample efficient and Model Predictive Control-based Approach

TL;DR

This work tackles autonomous boat autopilot under strong ocean disturbances with limited real-world data. It introduces SPMPC, a sample-efficient, probabilistic model-based RL framework that pairs Gaussian process dynamics learning with model predictive control, using a modified moment-matching approach for efficient long-horizon optimization within an MPC loop. The method is validated through simulation and real-boat experiments, showing robust performance, effective handling of wind/current disturbances, and substantial data efficiency (thousands of samples). The results suggest SPMPC as a practical, scalable approach for real-world autonomous marine navigation and potentially other high-disturbance robotics domains.

Abstract

In this research we focus on developing a reinforcement learning system for a challenging task: autonomous control of a real-sized boat, with difficulties arising from large uncertainties in the challenging ocean environment and the extremely high cost of exploring and sampling with a real boat. To this end, we explore a novel Gaussian processes (GP) based reinforcement learning approach that combines sample-efficient model-based reinforcement learning and model predictive control (MPC). Our approach, sample-efficient probabilistic model predictive control (SPMPC), iteratively learns a Gaussian process dynamics model and uses it to efficiently update control signals within the MPC closed control loop. A system using SPMPC is built to efficiently learn an autopilot task. After investigating its performance in a simulation modeled upon real boat driving data, the proposed system successfully learns to drive a real-sized boat equipped with a single engine and sensors measuring GPS, speed, direction, and wind in an autopilot task without human demonstration.

Paper Structure

This paper contains 19 sections, 20 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: The Nissan Joy Fisher 25 for experiment (left) with GPS/speed/direction sensor, engine (right top), and the wind sensor (right bottom).
  • Figure 2: Overall of the MPC framework. Left: The MPC framework. Right: the MPC framework with long-term prediction in autonomous boat control.
  • Figure 3: The SPMPC System.
  • Figure 4: Convergence of SPMPC with Euclidean-distance and Mahalanobis-distance based cost functions.
  • Figure 5: Examples of (a) the baseline and SPMPC in simulation task (b) the RL exploration samples and initial samples
  • ...and 3 more figures