Table of Contents
Fetching ...

Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

Y. Wang, Y. Zhao

TL;DR

This work proposes using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability, and develops two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance.

Abstract

In the real world, unmanned surface vehicles (USV) often need to coordinate with each other to accomplish specific tasks. However, achieving cooperative control in multi-agent systems is challenging due to issues such as non-stationarity and partial observability. Recent advancements in Multi-Agent Reinforcement Learning (MARL) provide new perspectives to address these challenges. Therefore, we propose using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability. We developed two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance. In these tasks, ships must not only learn effective control strategies but also establish communication protocols with other agents. We analyze the impact of external noise on communication, the effect of inter-agent communication on performance, and the communication patterns learned by the agents. The results demonstrate that our proposed framework effectively addresses cooperative navigation and collision avoidance among multiple vessels, significantly outperforming traditional single-agent algorithms. Agents establish a consistent communication protocol, enabling them to compensate for missing information through shared observations and achieve better coordination.

Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

TL;DR

This work proposes using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability, and develops two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance.

Abstract

In the real world, unmanned surface vehicles (USV) often need to coordinate with each other to accomplish specific tasks. However, achieving cooperative control in multi-agent systems is challenging due to issues such as non-stationarity and partial observability. Recent advancements in Multi-Agent Reinforcement Learning (MARL) provide new perspectives to address these challenges. Therefore, we propose using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability. We developed two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance. In these tasks, ships must not only learn effective control strategies but also establish communication protocols with other agents. We analyze the impact of external noise on communication, the effect of inter-agent communication on performance, and the communication patterns learned by the agents. The results demonstrate that our proposed framework effectively addresses cooperative navigation and collision avoidance among multiple vessels, significantly outperforming traditional single-agent algorithms. Agents establish a consistent communication protocol, enabling them to compensate for missing information through shared observations and achieve better coordination.

Paper Structure

This paper contains 19 sections, 17 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Schematic diagram of typical work scenarios of USVs. Due to the technological limitations and environmental uncertainties, USVs can only partially observe the environment most of the time, which means USVs should have the ability to make decisions based on imperfect information. In addition, a truly smart USV should know how to collaborate with other entities to accomplish certain tasks, e.g., cooperative navigation and cooperative collision avoidance
  • Figure 2: (a) An illustration of USV (b) Reference system for modeling of ship dynamics (c) Overview of centralized training and decentralized execution approach. Each agent $i$ receives observations $o_i$ and executes actions $a_i$. A central critic has access to the local observations and actions of all agents during the training stage. (d) Propeller revolution rate and rugger angle control the ship's movement. (e)The ship trajectory predicted by the MMG model shows good agreement with the experiment in the turning test. (f) MADDPG framework for agent $i$ in our multi-ship environment. Both the actor and the critic contain two subnetworks, namely one main network and one target network. MADDPG allows agents to communicate. When one agent generates a message at time step $t$, this message could be used as part of observation for other agents
  • Figure 3: In ship cooperative navigation scenario (a), ships can observe relative position to all landmarks, but it doesn’t know which one is real target. The human proxy (speaker) knows the color of the real target, so it must learn to output messages that guide ships to reach corresponding targets.(b) The reward history of MADDPG against other methods. (c) Typical results for $N$ = 1 case. The gray circle represents human proxy (speaker), and colored circles are landmarks. Please refer to movie S1 and S2 in the supplementary material for animations of additional cases. (d) An illustration of noisy channels. In BSC channel, we impose flip error on discrete digital signal, while in AWGN channel, the Gaussian white noise is added. (e) The training success rate decreases with $P_e$ and $\sigma^2$, in BSC (left) and AWGN (right) channel respectively (f) Average number of steps required to reach the goal increases with $P_e$ in the BSC channel (green solid line) for $N=1$ case. The red dot line means minimum steps that we estimate through the PID controller.(g) The effects of additive noise on average number of steps.
  • Figure 4: In cooperative collision avoidance scenario (a), both ships are required to reach the goal and aviod collision in a COLREGs complaint way. However, they are constrained by partial observability, which means can only observe own status and relative positions to target. Apart from communication, one ship cannot obtain any status information about another ship. (b) Encounter situations defined by COLREGs. (c) The reward history of MADDPG against other methods.(d) Bow crossing may happen when one ship is close to meeting point while another is far away. (e) In cooperative collision avoidance mode, both head on ship and give way ship take actions to prevent collision for (I) Overtaking, (II) Port crossing, (III) Starboard crossing, and (IV) Head on scenarios. (f) In non-cooperative manner, only give way ship alters, rudder angle. The stand-on ship doesn't take measures. (g) Compared with cooperative mode, the give way ship adopts larger rudder angles. (i) T tests suggests that there is no significance in the success rate of cooperative and non-cooperative mode. (j) T test suggests that the total traveling distance will be significant lower if two ships cooperate to avoid collision. (h) Communication vector length of four encounter situations where we illustrate in (e). Ships may adopt different communication patterns before and after two ships meet. (k) Before two ships meet, the communication vector length is highly related to the ship's distance to meeting point. (I) The further distance, the larger the communication vector length.