Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

Samuel Yanes Luis; Dmitriy Shutin; Juan Marchal Gómez; Daniel Gutiérrez Reina; Sergio Toral Marín

Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

Samuel Yanes Luis, Dmitriy Shutin, Juan Marchal Gómez, Daniel Gutiérrez Reina, Sergio Toral Marín

TL;DR

This work addresses efficient water-quality monitoring using a fleet of autonomous surface vehicles by uniting local Gaussian Processes with Deep Reinforcement Learning. Local GPs enable scalable, multimodal environmental modeling by partitioning space and combining local inferences, reducing the typical $O(N^3)$ GP complexity to $O(N^3 / M^2)$. A Double Deep Q-Learning policy, equipped with a consensus-based safety mechanism and a 5-channel visual observation, learns information-maximizing actions under safe coordination, using two information-based reward schemes. Results show substantial improvements in estimation accuracy and training efficiency over global GP baselines and classic path planners across WQP and algae-bloom scenarios, with the mu-change reward often delivering the strongest performance and enabling online fleet retraining in realistic conditions.

Abstract

The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective monitoring policies. Local Gaussian processes, unlike classical global Gaussian processes, can accurately model the information in a dissimilar spatial correlation which captures more accurately the water quality information. A Deep convolutional policy is proposed, that bases the decisions on the observation on the mean and variance of this model, by means of an information gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to minimize the estimation error in a safe manner thanks to a Consensus-based heuristic. Simulation results indicate an improvement of up to 24% in terms of the mean absolute error with the proposed models. Also, training results with 1-3 agents indicate that our proposed approach returns 20% and 24% smaller average estimation errors for, respectively, monitoring water quality variables and monitoring algae blooms, as compared to state-of-the-art approaches

Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

TL;DR

GP complexity to

. A Double Deep Q-Learning policy, equipped with a consensus-based safety mechanism and a 5-channel visual observation, learns information-maximizing actions under safe coordination, using two information-based reward schemes. Results show substantial improvements in estimation accuracy and training efficiency over global GP baselines and classic path planners across WQP and algae-bloom scenarios, with the mu-change reward often delivering the strongest performance and enabling online fleet retraining in realistic conditions.

Abstract

Paper Structure (17 sections, 16 equations, 18 figures, 5 tables, 2 algorithms)

This paper contains 17 sections, 16 equations, 18 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Statement of the problem
Ground Truths models
Assumptions
Methodology
Local Gaussian Process for estimation
Deep Reinforcement Learning
Observation function
Reward function
Deep Safe Policy for multiagent training
Simulations and Results
Local Gaussian Process performance
DRL fleet training
Comparison with other algorithms
...and 2 more sections

Figures (18)

Figure 1: Example of the ground truths used for every mission. In (a), the WQP map. In (b) an example of an algae scenario with two blooms. In green, $Z_1, Z_2, Z_3$ correspond to the initial deployment zones of the vehicles. The initial position of every vehicle is randomly selected within this areas.
Figure 2: Local GP applied to algae bloom detection with random paths for 3 ASVs. In (a), the local GpS influence areas and the Ground Truth. In (b) the synthesized model from the local GP $\hat{\mu}(x)$. In (b), the joint predictive uncertainty $\hat{\sigma}(x)$.
Figure 3: Influence areas $\mathcal{I}$ for every vehicle and its corresponding redundancy values $\rho$.
Figure 4: Dueling Neural Network architecture for the Q-function representation. It is composed by an initial visual encoder and two heads: i) the Advantage head and the Value head. The outputs are the 8 Q-values
Figure 5: Consensus scheme for the safe action selection. At instant $t$, the agent with higher $Q$ chooses its action first. Then, the second agent take an action rejecting any that causes collision. This is repeated until all agents have decided the next action, and a consensus is reached.
...and 13 more figures

Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

TL;DR

Abstract

Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

Authors

TL;DR

Abstract

Table of Contents

Figures (18)