Table of Contents
Fetching ...

Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle

Sebastian Zieglmeier, Niklas Erdmann, Narada D. Warakagoda

TL;DR

This work tackles reinforcement learning for locating randomly placed pollution clouds with an autonomous underwater vehicle in reward-sparse, nonstationary environments. To overcome tabular Q-learning's limitations, it develops a modified Monte Carlo-based approach augmented with Hierarchical Reinforcement Learning, Multiple Goal Learning, Trajectory Reward Learning, and Memory As Output Filter MOF. The results show superior performance over traditional patterns and Q-learning, with faster search trajectories and robust central-area strategies, demonstrating that carefully engineered RL methods can adapt to sparse, randomized tasks. The findings point to extensions into deep RL and more realistic simulations, widening the applicability of RL to complex, real-world search problems with unknown targets.

Abstract

Reinforcement learning (RL) algorithms are designed to optimize problem-solving by learning actions that maximize rewards, a task that becomes particularly challenging in random and nonstationary environments. Even advanced RL algorithms are often limited in their ability to solve problems in these conditions. In applications such as searching for underwater pollution clouds with autonomous underwater vehicles (AUVs), RL algorithms must navigate reward-sparse environments, where actions frequently result in a zero reward. This paper aims to address these challenges by revisiting and modifying classical RL approaches to efficiently operate in sparse, randomized, and nonstationary environments. We systematically study a large number of modifications, including hierarchical algorithm changes, multigoal learning, and the integration of a location memory as an external output filter to prevent state revisits. Our results demonstrate that a modified Monte Carlo-based approach significantly outperforms traditional Q-learning and two exhaustive search patterns, illustrating its potential in adapting RL to complex environments. These findings suggest that reinforcement learning approaches can be effectively adapted for use in random, nonstationary, and reward-sparse environments.

Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle

TL;DR

This work tackles reinforcement learning for locating randomly placed pollution clouds with an autonomous underwater vehicle in reward-sparse, nonstationary environments. To overcome tabular Q-learning's limitations, it develops a modified Monte Carlo-based approach augmented with Hierarchical Reinforcement Learning, Multiple Goal Learning, Trajectory Reward Learning, and Memory As Output Filter MOF. The results show superior performance over traditional patterns and Q-learning, with faster search trajectories and robust central-area strategies, demonstrating that carefully engineered RL methods can adapt to sparse, randomized tasks. The findings point to extensions into deep RL and more realistic simulations, widening the applicability of RL to complex, real-world search problems with unknown targets.

Abstract

Reinforcement learning (RL) algorithms are designed to optimize problem-solving by learning actions that maximize rewards, a task that becomes particularly challenging in random and nonstationary environments. Even advanced RL algorithms are often limited in their ability to solve problems in these conditions. In applications such as searching for underwater pollution clouds with autonomous underwater vehicles (AUVs), RL algorithms must navigate reward-sparse environments, where actions frequently result in a zero reward. This paper aims to address these challenges by revisiting and modifying classical RL approaches to efficiently operate in sparse, randomized, and nonstationary environments. We systematically study a large number of modifications, including hierarchical algorithm changes, multigoal learning, and the integration of a location memory as an external output filter to prevent state revisits. Our results demonstrate that a modified Monte Carlo-based approach significantly outperforms traditional Q-learning and two exhaustive search patterns, illustrating its potential in adapting RL to complex environments. These findings suggest that reinforcement learning approaches can be effectively adapted for use in random, nonstationary, and reward-sparse environments.

Paper Structure

This paper contains 16 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Q-table visualization for static setting: Showing the maximum Q-value of every state for episodes 500, 1000, 2000 (left to right). The last subfigure shows a visualization of the real environment for comparison.
  • Figure 2: Q-table visualization for varying settings: Displaying the maximum Q-value of every state for episodes 1, 500, 1000, 2000 (left to right).
  • Figure 3: From left to right: Training environment with a number of randomly spawned clouds. Evaluation environment with one randomly spawned cloud. Evaluation pattern, Snake and Spiral, respectively, in the evaluation environment.
  • Figure 4: Results of the first parameter tuning loop: Each graph shows the mean performance across 1000 evaluation episodes, each associated with one of the 20 independent runs of our method. The graphs include a 95% confidence interval, shown as the greyed out area around the blue curves. The y-axis presents the performance (mean number of steps until the agent discovers the pollution cloud). From left to right: tuning of the discount factor $\gamma$, option length, MOF value, and number of clouds during training, respectively.
  • Figure 5: The route followed by our agent, visualized with a heat-map coloring representing the number of visits per state (left). Duels won against the Spiral (middle) and against the Snake (right) by generating a cloud with the center in every location of the grid.
  • ...and 3 more figures