Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles

Levi Cai; Kevin Chang; Yogesh Girdhar

Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles

Levi Cai, Kevin Chang, Yogesh Girdhar

TL;DR

This paper tackles the challenge of robust 6-DOF control for thruster-driven AUVs under nonlinear hydrodynamics and variable payloads by training a policy that maps full 6-DOF commands to thruster outputs using a GPU-accelerated, highly parallel simulator. It combines a simplified inertia-based hydrodynamic model with domain randomization to bridge sim-to-real gaps, achieving zero-shot transfer to a real AUV (CUREE) and competitive performance against hand-tuned PID controllers. Key contributions include a GPU-accelerated underwater simulator, the first real-world demonstration of a command-conditioned 6-DOF controller directly to thrusters, and insights into sim-to-real transfer under varying hydrodynamic conditions. The approach promises rapid, configuration-agnostic controller development for diverse AUV platforms and payload setups, with potential online adaptation in future work.

Abstract

Controlling AUVs can be challenging because of the effect of complex non-linear hydrodynamic forces acting on the robot, which are significant in water and cannot be ignored. The problem is exacerbated for small AUVs for which the dynamics can change significantly with payload changes and deployments under different hydrodynamic conditions. The common approach to AUV control is a combination of passive stabilization with added buoyancy on top and weights on the bottom, and a PID controller tuned for simple and smooth motion primitives. However, the approach comes at the cost of sluggish controls and often the need to re-tune controllers with configuration changes. In this paper, we propose a fast (trainable in minutes), reinforcement learning-based approach for full 6 degree of freedom (DOF) control of a thruster-driven AUVs, taking 6-DOF command-conditioned inputs direct to thruster outputs. We present a new, highly parallelized simulator for underwater vehicle dynamics. We demonstrate this approach through zero-shot sim-to-real (with no tuning) transfer onto a real AUV that produces comparable results to hand-tuned PID controllers. Furthermore, we show that domain randomization on the simulator produces policies that are robust to small variations in vehicle's physical parameters.

Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles

TL;DR

Abstract

Paper Structure (17 sections, 10 equations, 7 figures, 3 tables)

This paper contains 17 sections, 10 equations, 7 figures, 3 tables.

INTRODUCTION
RELATED WORK
Reinforcement Learning for Control
Simulators for Underwater Vehicles
METHODS
GPU-based Simulation
Simulated Hydrodynamic Model
Learning-based controls for 6-DOF autonomous underwater vehicles
Learning environment and algorithm
Observations, actions, and rewards
Domain transfer (validation in simulation)
Domain transfer (sim-to-real)
Results and Discussion
Training
Evaluating domain transfer in simulation
...and 2 more sections

Figures (7)

Figure 1: Using a highly parallelized training environment, we can rapidly train neural controllers that can take advantage of domain randomization techniques for robust zero-shot deployment on real hardware. For hardware evaluation, we use the CUREE vehicle curee, shown here traveling through a coral reef environment interacting with the local wildlife.
Figure 2: Our proposed approach uses on-policy reinforcement learning methods coupled with a highly parallelized underwater simulator to learn a robust control policy, which can be used directly on a robot with no additional training. This takes as input full 6-DOF commands relative to the AUV pose, and directly outputs thruster commands.
Figure 3: Mean reward collected during training with various amounts of domain randomization.
Figure 4: Plotted results from simulation testing of networks under different parameters demonstrating how networks trained using domain randomization exceed performance of naive networks when dynamic parameters are shifted. We use the $l_2$-norm and the quaternion distance described in Equation \ref{['eq:quatdistance']} to compute the errors.
Figure 5: Real world testing experiments, showing neural controller ability to reject large disturbances while holding position and orientation. Note that the vehicle is ballasted to be positively buoyant, so must actively position hold. The controller is running on-board the vehicle's Jetson Orin NX and sending low-level motor commands at 20Hz. A human with a stick drags CUREE aggressively to the side, and the neural controller is able to return. An AprilTag is used for vision-based feedback to provide global position in the tank. This is fused with DVL and IMU feedback for faster position feedback and for when the AprilTag can no longer be seen. Timestamps are in minutes and manually aligned with \ref{['fig:neural-disturb']}.
...and 2 more figures

Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles

TL;DR

Abstract

Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles

Authors

TL;DR

Abstract

Table of Contents

Figures (7)