Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning
Davide Corsi, Davide Camponogara, Alessandro Farinelli
TL;DR
The paper tackles the problem of training deep reinforcement learning (DRL) agents for aquatic navigation in non-stationary water environments, proposing a Unity3D-based simulator that supports both underwater and surface scenarios. It advances a PPO-based training pipeline augmented with curriculum learning and learnable hyperparameters, plus safety-oriented reward shaping, and provides an extensive set of ablations to establish baselines. Key contributions include a realistic, open-source aquatic benchmark, a configurable training pipeline with curriculum learning, dense rewards, and safety considerations, and validation on a photogrammetry-derived cave model of Porth Yr Ogof. Findings indicate that curriculum learning can improve safety and generalization, dense rewards are essential for convergence, and the combined method yields promising policies while underscoring remaining generalization and safety challenges. Overall, the work delivers a practical, reproducible platform to advance safe DRL for aquatic robotics and invites collaboration across researchers and applications.
Abstract
An exciting and promising frontier for Deep Reinforcement Learning (DRL) is its application to real-world robotic systems. While modern DRL approaches achieved remarkable successes in many robotic scenarios (including mobile robotics, surgical assistance, and autonomous driving) unpredictable and non-stationary environments can pose critical challenges to such methods. These features can significantly undermine fundamental requirements for a successful training process, such as the Markovian properties of the transition model. To address this challenge, we propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and DRL. In more detail, we show that our benchmarking environment is problematic even for state-of-the-art DRL approaches that may struggle to generate reliable policies in terms of generalization power and safety. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques (such as curriculum learning and learnable hyperparameters). Our extensive empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results. Our simulation environment and training baselines are freely available to facilitate further research on this open problem and encourage collaboration in the field.
