Table of Contents
Fetching ...

Distributional Reinforcement Learning based Integrated Decision Making and Control for Autonomous Surface Vehicles

Xi Lin, Paul Szenher, Yewei Huang, Brendan Englot

TL;DR

This paper tackles autonomous navigation for Autonomous Surface Vehicles (ASVs) in congested waterways where perception errors and disturbances complicate COLREGs compliance. It introduces a Distributional Reinforcement Learning approach, specifically an Actor-Critic Implicit Quantile Networks (AC-IQN) policy, to generate continuous thrust commands from onboard LiDAR and odometry data. The authors formalize the problem as a Markov Decision Process and model the return as a distribution Z, using IQN to represent quantiles of the return and an actor-critic setup to learn continuous control. Through high-fidelity Gazebo VRX simulations, AC-IQN demonstrates superior navigation safety and efficiency over state-of-the-art Distributional RL, non-Distributional RL, and classical baselines, indicating strong potential for practical autonomous marine navigation.

Abstract

With the growing demands for Autonomous Surface Vehicles (ASVs) in recent years, the number of ASVs being deployed for various maritime missions is expected to increase rapidly in the near future. However, it is still challenging for ASVs to perform sensor-based autonomous navigation in obstacle-filled and congested waterways, where perception errors, closely gathered vehicles and limited maneuvering space near buoys may cause difficulties in following the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs). To address these issues, we propose a novel Distributional Reinforcement Learning based navigation system that can work with onboard LiDAR and odometry sensors to generate arbitrary thrust commands in continuous action space. Comprehensive evaluations of the proposed system in high-fidelity Gazebo simulations show its ability to decide whether to follow COLREGs or take other beneficial actions based on the scenarios encountered, offering superior performance in navigation safety and efficiency compared to systems using state-of-the-art Distributional RL, non-Distributional RL and classical methods.

Distributional Reinforcement Learning based Integrated Decision Making and Control for Autonomous Surface Vehicles

TL;DR

This paper tackles autonomous navigation for Autonomous Surface Vehicles (ASVs) in congested waterways where perception errors and disturbances complicate COLREGs compliance. It introduces a Distributional Reinforcement Learning approach, specifically an Actor-Critic Implicit Quantile Networks (AC-IQN) policy, to generate continuous thrust commands from onboard LiDAR and odometry data. The authors formalize the problem as a Markov Decision Process and model the return as a distribution Z, using IQN to represent quantiles of the return and an actor-critic setup to learn continuous control. Through high-fidelity Gazebo VRX simulations, AC-IQN demonstrates superior navigation safety and efficiency over state-of-the-art Distributional RL, non-Distributional RL, and classical baselines, indicating strong potential for practical autonomous marine navigation.

Abstract

With the growing demands for Autonomous Surface Vehicles (ASVs) in recent years, the number of ASVs being deployed for various maritime missions is expected to increase rapidly in the near future. However, it is still challenging for ASVs to perform sensor-based autonomous navigation in obstacle-filled and congested waterways, where perception errors, closely gathered vehicles and limited maneuvering space near buoys may cause difficulties in following the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs). To address these issues, we propose a novel Distributional Reinforcement Learning based navigation system that can work with onboard LiDAR and odometry sensors to generate arbitrary thrust commands in continuous action space. Comprehensive evaluations of the proposed system in high-fidelity Gazebo simulations show its ability to decide whether to follow COLREGs or take other beneficial actions based on the scenarios encountered, offering superior performance in navigation safety and efficiency compared to systems using state-of-the-art Distributional RL, non-Distributional RL and classical methods.

Paper Structure

This paper contains 13 sections, 20 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: The proposed navigation system. The top figure shows the perspective in the Gazebo simulation. The middle figure visualizes the segmentation result of LiDAR point clouds received by the lower right vehicle in the top figure. The bottom figure illustrates the decision making and control module of the proposed system.
  • Figure 2: Example training scenarios. The velocity of each vehicle is indicated by the red arrow, goal positions are plotted as green stars, and buoys are shown as black circles.
  • Figure 3: Learning performance. Each curve and its band width in the above cumulative reward and success rate plots reflect the values of the mean and standard error. To compute the average travel time, we only include data from robots that successfully reach their goals.
  • Figure 4: AC-IQN network architecture. FC, ReLU, COS, $\odot$ and CONCAT stand for Fully Connected Layer, Rectified Linear Unit, Cosine Embedding Layer, element-wise product and concatenation of tensors. The numbers after IN and OUT are the input and output dimension of a layer.
  • Figure 5: Head-on and crossing scenarios. Velocities of the ego vehicle that are consistent with COLREGs requirements are plotted in yellow. The COLREGs compliant velocity of each vehicle is computed by viewing it as the ego vehicle.
  • ...and 3 more figures