Table of Contents
Fetching ...

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

Nicolò Dal Fabbro, Milad Mesbahi, Renato Mendes, João Borges de Sousa, George J. Pappas

TL;DR

This work tackles long-term mapping of a dynamic river plume with energy-constrained autonomous underwater vehicles. It couples spatiotemporal Gaussian process regression for salinity estimation with a centralized multi-agent reinforcement learning controller that decomposes actions into per-AUV direction and speed, using intermittent communication to manage energy use. The GP uses a separable kernel with a fixed mean $m(x,t)=f_{\mathrm{ocn}}$ and a posterior mean $\hat{f}(x,t|\mathcal{M}_k^N)=f_{\mathrm{ocn}}+k_* \bar{K}^{-1}(f(D_k^N)-f_{\mathrm{ocn}})$, while the MARL policy employs a two-headed DQN over a discretized action space and a CNN-based state representation that fuses the GP map, trajectories, and wind. Empirical results on a Delft3D Douro plume model show that the proposed method achieves lower MSE than baselines and that doubling the fleet size can more than double mission endurance, with robust generalization to unseen seasonal regimes, highlighting practical potential for data-driven, long-duration coastal monitoring.

Abstract

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

TL;DR

This work tackles long-term mapping of a dynamic river plume with energy-constrained autonomous underwater vehicles. It couples spatiotemporal Gaussian process regression for salinity estimation with a centralized multi-agent reinforcement learning controller that decomposes actions into per-AUV direction and speed, using intermittent communication to manage energy use. The GP uses a separable kernel with a fixed mean and a posterior mean , while the MARL policy employs a two-headed DQN over a discretized action space and a CNN-based state representation that fuses the GP map, trajectories, and wind. Empirical results on a Delft3D Douro plume model show that the proposed method achieves lower MSE than baselines and that doubling the fleet size can more than double mission endurance, with robust generalization to unseen seasonal regimes, highlighting practical potential for data-driven, long-duration coastal monitoring.

Abstract

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

Paper Structure

This paper contains 7 sections, 19 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Plume monitoring setting. The Douro River discharges into the Atlantic Ocean, generating a dynamic salinity plume (whose edge is represented by a red boundary). Multiple AUVs (black arrows) collect trajectory-constrained measurements and intermittently communicate with a central server to coordinate.
  • Figure 2: Illustrative example of 10 hours of spatiotemporal evolution of the Douro river plume and AUV mobility (red trajectory), during March 2, 2018. The AUV uses the propulsion that provides the nominal speed of $1 m/s$ (in absence of ocean flow). Note that the AUV mobility is impacted by the currents.
  • Figure 3: Salinity map of the Douro river plume (on the left) and a visualization of the ocean flow (on the right). Note the correlation between the salinity map and the currents' distribution, and that the speed of the currents gets above $1m/s$, namely the nominal speed of the AUVs.
  • Figure 4: Light autonomous underwater vehicles.
  • Figure 5: System architecture.
  • ...and 7 more figures