Table of Contents
Fetching ...

Decentralized Multi-Robot Navigation for Autonomous Surface Vehicles with Distributional Reinforcement Learning

Xi Lin, Yewei Huang, Fanfei Chen, Brendan Englot

TL;DR

A decentralized multi-ASV collision avoidance policy based on Distributional Reinforcement Learning, which considers the interactions among ASVs as well as with static obstacles and current flows is proposed, which achieves superior performance in navigation safety, while requiring minimal travel time and energy.

Abstract

Collision avoidance algorithms for Autonomous Surface Vehicles (ASV) that follow the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) have been proposed in recent years. However, it may be difficult and unsafe to follow COLREGs in congested waters, where multiple ASVs are navigating in the presence of static obstacles and strong currents, due to the complex interactions. To address this problem, we propose a decentralized multi-ASV collision avoidance policy based on Distributional Reinforcement Learning, which considers the interactions among ASVs as well as with static obstacles and current flows. We evaluate the performance of the proposed Distributional RL based policy against a traditional RL-based policy and two classical methods, Artificial Potential Fields (APF) and Reciprocal Velocity Obstacles (RVO), in simulation experiments, which show that the proposed policy achieves superior performance in navigation safety, while requiring minimal travel time and energy. A variant of our framework that automatically adapts its risk sensitivity is also demonstrated to improve ASV safety in highly congested environments.

Decentralized Multi-Robot Navigation for Autonomous Surface Vehicles with Distributional Reinforcement Learning

TL;DR

A decentralized multi-ASV collision avoidance policy based on Distributional Reinforcement Learning, which considers the interactions among ASVs as well as with static obstacles and current flows is proposed, which achieves superior performance in navigation safety, while requiring minimal travel time and energy.

Abstract

Collision avoidance algorithms for Autonomous Surface Vehicles (ASV) that follow the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) have been proposed in recent years. However, it may be difficult and unsafe to follow COLREGs in congested waters, where multiple ASVs are navigating in the presence of static obstacles and strong currents, due to the complex interactions. To address this problem, we propose a decentralized multi-ASV collision avoidance policy based on Distributional Reinforcement Learning, which considers the interactions among ASVs as well as with static obstacles and current flows. We evaluate the performance of the proposed Distributional RL based policy against a traditional RL-based policy and two classical methods, Artificial Potential Fields (APF) and Reciprocal Velocity Obstacles (RVO), in simulation experiments, which show that the proposed policy achieves superior performance in navigation safety, while requiring minimal travel time and energy. A variant of our framework that automatically adapts its risk sensitivity is also demonstrated to improve ASV safety in highly congested environments.
Paper Structure (10 sections, 26 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 10 sections, 26 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Our decentralized decision making framework.
  • Figure 2: IQN network model. FC, COS and ReLU stand for fully connected layer, cosine embedding layer, and rectified linear unit. Outputs of the model are the return distributions of actions.
  • Figure 3: Training environments. Examples of random environments generated according to the training schedule (environments of increasing difficulty are shown from left to right, top to bottom). The initial poses and velocities of the robots are indicated with green rectangles and red arrows.
  • Figure 4: Evaluation performance during training. Solid lines and bandwidths indicate the mean and standard error over the results of all learning models.
  • Figure 5: Experimental results. Top and bottom rows show the results of experiments without and with static obstacles, respectively. Time and energy plots show the distributions of travel time and energy consumption of the robots in successful episodes.
  • ...and 2 more figures