Distributional Reinforcement Learning based Integrated Decision Making and Control for Autonomous Surface Vehicles
Xi Lin, Paul Szenher, Yewei Huang, Brendan Englot
TL;DR
This paper tackles autonomous navigation for Autonomous Surface Vehicles (ASVs) in congested waterways where perception errors and disturbances complicate COLREGs compliance. It introduces a Distributional Reinforcement Learning approach, specifically an Actor-Critic Implicit Quantile Networks (AC-IQN) policy, to generate continuous thrust commands from onboard LiDAR and odometry data. The authors formalize the problem as a Markov Decision Process and model the return as a distribution Z, using IQN to represent quantiles of the return and an actor-critic setup to learn continuous control. Through high-fidelity Gazebo VRX simulations, AC-IQN demonstrates superior navigation safety and efficiency over state-of-the-art Distributional RL, non-Distributional RL, and classical baselines, indicating strong potential for practical autonomous marine navigation.
Abstract
With the growing demands for Autonomous Surface Vehicles (ASVs) in recent years, the number of ASVs being deployed for various maritime missions is expected to increase rapidly in the near future. However, it is still challenging for ASVs to perform sensor-based autonomous navigation in obstacle-filled and congested waterways, where perception errors, closely gathered vehicles and limited maneuvering space near buoys may cause difficulties in following the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs). To address these issues, we propose a novel Distributional Reinforcement Learning based navigation system that can work with onboard LiDAR and odometry sensors to generate arbitrary thrust commands in continuous action space. Comprehensive evaluations of the proposed system in high-fidelity Gazebo simulations show its ability to decide whether to follow COLREGs or take other beneficial actions based on the scenarios encountered, offering superior performance in navigation safety and efficiency compared to systems using state-of-the-art Distributional RL, non-Distributional RL and classical methods.
