Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

Shrudhi R S; Sreyash Mohanty; Susan Elias

Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

Shrudhi R S, Sreyash Mohanty, Susan Elias

TL;DR

The paper addresses the challenge of coordinating multiple unmanned surface vehicles (USVs) to clean marine debris in dynamic water environments. It proposes a Multi-Agent Deep Deterministic Policy Gradient (MA-DDPG) framework with centralized critics and decentralized actors, implemented within the ROS and Gazebo simulation stack and integrated via the openai_ros package to connect ROS with OpenAI Gym. Key contributions include the ROS/Gazebo-enabled WAM-V architecture, a concrete observation-action space for USVs, and a tailored reward shaping scheme to promote rapid, safe debris collection while maintaining swarm coordination (e.g., $R = w_1 R_{collect} - w_2 P_{coll} - w_3 P_{time} + w_4 R_{coord}$). The results demonstrate improved coordination, scalability, and stability in simulated swarms, underscoring the practicality of modular, ROS-based RL pipelines for autonomous marine swarm control and their potential extension to other ROS-enabled robotic swarms.

Abstract

An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system using Robot Operating System (ROS) and Gazebo. With the main objective of coordinated task completion, the Markov decision process (MDP) provides a base to formulate a task decision problem to achieve efficient localization and tracking in a highly dynamic water environment. To coordinate multiple USVs performing real-time target tracking, we propose an enhanced multi-agent reinforcement learning approach. Our proposed scheme uses MA-DDPG, or Multi-Agent Deep Deterministic Policy Gradient, an extension of the Deep Deterministic Policy Gradients (DDPG) algorithm that allows for decentralized control of multiple agents in a cooperative environment. MA-DDPG's decentralised control allows each and every agent to make decisions based on its own observations and objectives, which can lead to superior gross performance and improved stability. Additionally, it provides communication and coordination among agents through the use of collective readings and rewards.

Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

TL;DR

). The results demonstrate improved coordination, scalability, and stability in simulated swarms, underscoring the practicality of modular, ROS-based RL pipelines for autonomous marine swarm control and their potential extension to other ROS-enabled robotic swarms.

Abstract

Paper Structure (14 sections, 10 figures)

This paper contains 14 sections, 10 figures.

Introduction
Related Work
Proposed Algorithm
Multi-Agent Reinforcement Learning
Deep Deterministic Policy Gradient
Multi-agent Deep Deterministic Policy Gradient
Robot Operating System (ROS)
Observation and Action Spaces
Implementation
Architecture of WAM-V in ROS and Gazebo
Overview of the openai_ ros package for Integrating ROS and OpenAI
Reward Shaping
Applications
Conclusion

Figures (10)

Figure 1: Schematic of Multi-agent Reinforcement Learning Paradigm
Figure 2: Schematic of Deep Deterministic Policy Gradient
Figure 3: Schematic of Multi-agent Deep Deterministic Policy Gradient
Figure 4: Schematic of ROS Communication System
Figure 5: Schematic of the Proposed Workflow
...and 5 more figures

Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

TL;DR

Abstract

Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

Authors

TL;DR

Abstract

Table of Contents

Figures (10)