Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS
Shrudhi R S, Sreyash Mohanty, Susan Elias
TL;DR
The paper addresses the challenge of coordinating multiple unmanned surface vehicles (USVs) to clean marine debris in dynamic water environments. It proposes a Multi-Agent Deep Deterministic Policy Gradient (MA-DDPG) framework with centralized critics and decentralized actors, implemented within the ROS and Gazebo simulation stack and integrated via the openai_ros package to connect ROS with OpenAI Gym. Key contributions include the ROS/Gazebo-enabled WAM-V architecture, a concrete observation-action space for USVs, and a tailored reward shaping scheme to promote rapid, safe debris collection while maintaining swarm coordination (e.g., $R = w_1 R_{collect} - w_2 P_{coll} - w_3 P_{time} + w_4 R_{coord}$). The results demonstrate improved coordination, scalability, and stability in simulated swarms, underscoring the practicality of modular, ROS-based RL pipelines for autonomous marine swarm control and their potential extension to other ROS-enabled robotic swarms.
Abstract
An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system using Robot Operating System (ROS) and Gazebo. With the main objective of coordinated task completion, the Markov decision process (MDP) provides a base to formulate a task decision problem to achieve efficient localization and tracking in a highly dynamic water environment. To coordinate multiple USVs performing real-time target tracking, we propose an enhanced multi-agent reinforcement learning approach. Our proposed scheme uses MA-DDPG, or Multi-Agent Deep Deterministic Policy Gradient, an extension of the Deep Deterministic Policy Gradients (DDPG) algorithm that allows for decentralized control of multiple agents in a cooperative environment. MA-DDPG's decentralised control allows each and every agent to make decisions based on its own observations and objectives, which can lead to superior gross performance and improved stability. Additionally, it provides communication and coordination among agents through the use of collective readings and rewards.
