Table of Contents
Fetching ...

Toward 6-DOF Autonomous Underwater Vehicle Energy-Aware Position Control based on Deep Reinforcement Learning: Preliminary Results

Gustavo Boré, Vicente Sufán, Sebastián Rodríguez-Martínez, Giancarlo Troni

TL;DR

This work tackles energy-efficient, holonomic control of a $6$-DOF AUV by training end-to-end DRL policies with Truncated Quantile Critics (TQC). Two variants are proposed: tqc-hp for high-precision reaching and tqc-ea for energy-aware control, both operating directly on eight thrusters without manual thruster modeling. In the Stonefish simulator, tqc-hp consistently improves pose and attitude RMSE over a fine-tuned PID, while tqc-ea achieves roughly 30% lower power usage at a modest cost to accuracy, illustrating a practical trade-off for long-range missions. The results demonstrate the feasibility of DRL-based, energy-conscious low-level control for holonomic AUVs, with field deployment planned for MBARI’s mola platform. $6$-DOF autonomous underwater control using end-to-end DRL could significantly extend mission endurance and operational flexibility in challenging underwater environments.

Abstract

The use of autonomous underwater vehicles (AUVs) for surveying, mapping, and inspecting unexplored underwater areas plays a crucial role, where maneuverability and power efficiency are key factors for extending the use of these platforms, making six degrees of freedom (6-DOF) holonomic platforms essential tools. Although Proportional-Integral-Derivative (PID) and Model Predictive Control controllers are widely used in these applications, they often require accurate system knowledge, struggle with repeatability when facing payload or configuration changes, and can be time-consuming to fine-tune. While more advanced methods based on Deep Reinforcement Learning (DRL) have been proposed, they are typically limited to operating in fewer degrees of freedom. This paper proposes a novel DRL-based approach for controlling holonomic 6-DOF AUVs using the Truncated Quantile Critics (TQC) algorithm, which does not require manual tuning and directly feeds commands to the thrusters without prior knowledge of their configuration. Furthermore, it incorporates power consumption directly into the reward function. Simulation results show that the TQC High-Performance method achieves better performance to a fine-tuned PID controller when reaching a goal point, while the TQC Energy-Aware method demonstrates slightly lower performance but consumes 30% less power on average.

Toward 6-DOF Autonomous Underwater Vehicle Energy-Aware Position Control based on Deep Reinforcement Learning: Preliminary Results

TL;DR

This work tackles energy-efficient, holonomic control of a -DOF AUV by training end-to-end DRL policies with Truncated Quantile Critics (TQC). Two variants are proposed: tqc-hp for high-precision reaching and tqc-ea for energy-aware control, both operating directly on eight thrusters without manual thruster modeling. In the Stonefish simulator, tqc-hp consistently improves pose and attitude RMSE over a fine-tuned PID, while tqc-ea achieves roughly 30% lower power usage at a modest cost to accuracy, illustrating a practical trade-off for long-range missions. The results demonstrate the feasibility of DRL-based, energy-conscious low-level control for holonomic AUVs, with field deployment planned for MBARI’s mola platform. -DOF autonomous underwater control using end-to-end DRL could significantly extend mission endurance and operational flexibility in challenging underwater environments.

Abstract

The use of autonomous underwater vehicles (AUVs) for surveying, mapping, and inspecting unexplored underwater areas plays a crucial role, where maneuverability and power efficiency are key factors for extending the use of these platforms, making six degrees of freedom (6-DOF) holonomic platforms essential tools. Although Proportional-Integral-Derivative (PID) and Model Predictive Control controllers are widely used in these applications, they often require accurate system knowledge, struggle with repeatability when facing payload or configuration changes, and can be time-consuming to fine-tune. While more advanced methods based on Deep Reinforcement Learning (DRL) have been proposed, they are typically limited to operating in fewer degrees of freedom. This paper proposes a novel DRL-based approach for controlling holonomic 6-DOF AUVs using the Truncated Quantile Critics (TQC) algorithm, which does not require manual tuning and directly feeds commands to the thrusters without prior knowledge of their configuration. Furthermore, it incorporates power consumption directly into the reward function. Simulation results show that the TQC High-Performance method achieves better performance to a fine-tuned PID controller when reaching a goal point, while the TQC Energy-Aware method demonstrates slightly lower performance but consumes 30% less power on average.

Paper Structure

This paper contains 12 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Stonefish model of mola 6dof auv, mbari's autonomous research platform for complex terrain exploration.
  • Figure 2: Average reward per episode over a moving window of 100 episodes obtained by the TQC, SAC, and TD3 algorithms during a $2.5\times10^6$ step training, equivalent to 3125 episodes.
  • Figure 3: Position in the $x$, $y$, and $z$ axes and angular distance to the target, $\theta$, over time during one of the evaluation episodes.
  • Figure 4: 3D trajectory followed by the auv in the same episode as depicted in Fig. 3, using the PID, tqc HP, and tqc EA controllers
  • Figure 5: Average power consumption and standard deviation of the auv during the evaluation using pid, tqc-hp and tqc-ea controllers.