Reinforcement Learning Ship Autopilot: Sample efficient and Model Predictive Control-based Approach
Yunduan Cui, Shigeki Osaki, Takamitsu Matsubara
TL;DR
This work tackles autonomous boat autopilot under strong ocean disturbances with limited real-world data. It introduces SPMPC, a sample-efficient, probabilistic model-based RL framework that pairs Gaussian process dynamics learning with model predictive control, using a modified moment-matching approach for efficient long-horizon optimization within an MPC loop. The method is validated through simulation and real-boat experiments, showing robust performance, effective handling of wind/current disturbances, and substantial data efficiency (thousands of samples). The results suggest SPMPC as a practical, scalable approach for real-world autonomous marine navigation and potentially other high-disturbance robotics domains.
Abstract
In this research we focus on developing a reinforcement learning system for a challenging task: autonomous control of a real-sized boat, with difficulties arising from large uncertainties in the challenging ocean environment and the extremely high cost of exploring and sampling with a real boat. To this end, we explore a novel Gaussian processes (GP) based reinforcement learning approach that combines sample-efficient model-based reinforcement learning and model predictive control (MPC). Our approach, sample-efficient probabilistic model predictive control (SPMPC), iteratively learns a Gaussian process dynamics model and uses it to efficiently update control signals within the MPC closed control loop. A system using SPMPC is built to efficiently learn an autopilot task. After investigating its performance in a simulation modeled upon real boat driving data, the proposed system successfully learns to drive a real-sized boat equipped with a single engine and sensors measuring GPS, speed, direction, and wind in an autopilot task without human demonstration.
