Table of Contents
Fetching ...

Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles with Deep Reinforcement Learning

Alejandro Mendoza Barrionuevo, Samuel Yanes Luis, Daniel Gutiérrez Reina, Sergio L. Toral Marín

TL;DR

Addresses the challenge of locating and collecting floating plastic waste in aquatic environments. The authors propose a model-free DRL framework for informative path planning over a heterogeneous fleet of ASVs divided into scouts and cleaners, coordinated via a shared trash model. The approach introduces a specialized state representation and a tailored reward design, implemented as two team-specific networks using Double Deep Q-Learning with prioritized replay. Results across two port-like scenarios show DRL-based methods outperform heuristics, especially in complex layouts, and training with Greedy actions further enhances performance, suggesting strong practical potential despite higher inference cost.

Abstract

This paper presents a model-free deep reinforcement learning framework for informative path planning with heterogeneous fleets of autonomous surface vehicles to locate and collect plastic waste. The system employs two teams of vehicles: scouts and cleaners. Coordination between these teams is achieved through a deep reinforcement approach, allowing agents to learn strategies to maximize cleaning efficiency. The primary objective is for the scout team to provide an up-to-date contamination model, while the cleaner team collects as much waste as possible following this model. This strategy leads to heterogeneous teams that optimize fleet efficiency through inter-team cooperation supported by a tailored reward function. Different trainings of the proposed algorithm are compared with other state-of-the-art heuristics in two distinct scenarios, one with high convexity and another with narrow corridors and challenging access. According to the obtained results, it is demonstrated that deep reinforcement learning based algorithms outperform other benchmark heuristics, exhibiting superior adaptability. In addition, training with greedy actions further enhances performance, particularly in scenarios with intricate layouts.

Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles with Deep Reinforcement Learning

TL;DR

Addresses the challenge of locating and collecting floating plastic waste in aquatic environments. The authors propose a model-free DRL framework for informative path planning over a heterogeneous fleet of ASVs divided into scouts and cleaners, coordinated via a shared trash model. The approach introduces a specialized state representation and a tailored reward design, implemented as two team-specific networks using Double Deep Q-Learning with prioritized replay. Results across two port-like scenarios show DRL-based methods outperform heuristics, especially in complex layouts, and training with Greedy actions further enhances performance, suggesting strong practical potential despite higher inference cost.

Abstract

This paper presents a model-free deep reinforcement learning framework for informative path planning with heterogeneous fleets of autonomous surface vehicles to locate and collect plastic waste. The system employs two teams of vehicles: scouts and cleaners. Coordination between these teams is achieved through a deep reinforcement approach, allowing agents to learn strategies to maximize cleaning efficiency. The primary objective is for the scout team to provide an up-to-date contamination model, while the cleaner team collects as much waste as possible following this model. This strategy leads to heterogeneous teams that optimize fleet efficiency through inter-team cooperation supported by a tailored reward function. Different trainings of the proposed algorithm are compared with other state-of-the-art heuristics in two distinct scenarios, one with high convexity and another with narrow corridors and challenging access. According to the obtained results, it is demonstrated that deep reinforcement learning based algorithms outperform other benchmark heuristics, exhibiting superior adaptability. In addition, training with greedy actions further enhances performance, particularly in scenarios with intricate layouts.

Paper Structure

This paper contains 11 sections, 2 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Prototype of a scout ASV equipped with a Zed 2i stereo camera and differential GPS. An example of the YOLOv8 trash detection scheme in a real scenario is presented, showing the bounding box of the detection with its predictive confidence. Depth camera triangulation, GPS and heading enable global waste localization.
  • Figure 2: Representation of the two discretized scenario maps which differences in complexity. Initial deployment positions are marked in red.
  • Figure 3: Example of a state representation, composed of six image-like matrices. They are the input to the neural network.
  • Figure 4: Conceptual diagram of the framework presented in this work. The nexus of cooperation between the two teams is the trash model, which is the input for the DNN of each team. The scout team must provide the updated locations of the waste so that cleaners can collect it. Wider arrows indicate more influence.
  • Figure 5: Visual representations of the terms used in the reward functions.
  • ...and 2 more figures