Toward 6-DOF Autonomous Underwater Vehicle Energy-Aware Position Control based on Deep Reinforcement Learning: Preliminary Results
Gustavo Boré, Vicente Sufán, Sebastián Rodríguez-Martínez, Giancarlo Troni
TL;DR
This work tackles energy-efficient, holonomic control of a $6$-DOF AUV by training end-to-end DRL policies with Truncated Quantile Critics (TQC). Two variants are proposed: tqc-hp for high-precision reaching and tqc-ea for energy-aware control, both operating directly on eight thrusters without manual thruster modeling. In the Stonefish simulator, tqc-hp consistently improves pose and attitude RMSE over a fine-tuned PID, while tqc-ea achieves roughly 30% lower power usage at a modest cost to accuracy, illustrating a practical trade-off for long-range missions. The results demonstrate the feasibility of DRL-based, energy-conscious low-level control for holonomic AUVs, with field deployment planned for MBARI’s mola platform. $6$-DOF autonomous underwater control using end-to-end DRL could significantly extend mission endurance and operational flexibility in challenging underwater environments.
Abstract
The use of autonomous underwater vehicles (AUVs) for surveying, mapping, and inspecting unexplored underwater areas plays a crucial role, where maneuverability and power efficiency are key factors for extending the use of these platforms, making six degrees of freedom (6-DOF) holonomic platforms essential tools. Although Proportional-Integral-Derivative (PID) and Model Predictive Control controllers are widely used in these applications, they often require accurate system knowledge, struggle with repeatability when facing payload or configuration changes, and can be time-consuming to fine-tune. While more advanced methods based on Deep Reinforcement Learning (DRL) have been proposed, they are typically limited to operating in fewer degrees of freedom. This paper proposes a novel DRL-based approach for controlling holonomic 6-DOF AUVs using the Truncated Quantile Critics (TQC) algorithm, which does not require manual tuning and directly feeds commands to the thrusters without prior knowledge of their configuration. Furthermore, it incorporates power consumption directly into the reward function. Simulation results show that the TQC High-Performance method achieves better performance to a fine-tuned PID controller when reaching a goal point, while the TQC Energy-Aware method demonstrates slightly lower performance but consumes 30% less power on average.
