Table of Contents
Fetching ...

DeepVL: Dynamics and Inertial Measurements-based Deep Velocity Learning for Underwater Odometry

Mohit Singh, Kostas Alexis

TL;DR

DeepVL addresses the challenge of persistent underwater odometry under exteroceptive limitations by learning robot-centric velocity and uncertainty from proprioceptive cues (IMU, motor commands, battery) using a GRU-based ensemble. The velocity predictions and covariances are fused into an EKF, coupled with barometer depth and optional visual features, to achieve reliable long-term odometry and improved VIO robustness with scarce features. The approach uses a lightweight network (~28k parameters) with efficient inference (<5 ms on Orin AGX) and demonstrates strong performance across pool and fjord trials, including scenarios with visual blackout and monocular operation. The work validates the benefit of combining dynamics-aware proprioception with uncertainty-aware learning for resilient underwater localization and control.

Abstract

This paper presents a learned model to predict the robot-centric velocity of an underwater robot through dynamics-aware proprioception. The method exploits a recurrent neural network using as inputs inertial cues, motor commands, and battery voltage readings alongside the hidden state of the previous time-step to output robust velocity estimates and their associated uncertainty. An ensemble of networks is utilized to enhance the velocity and uncertainty predictions. Fusing the network's outputs into an Extended Kalman Filter, alongside inertial predictions and barometer updates, the method enables long-term underwater odometry without further exteroception. Furthermore, when integrated into visual-inertial odometry, the method assists in enhanced estimation resilience when dealing with an order of magnitude fewer total features tracked (as few as 1) as compared to conventional visual-inertial systems. Tested onboard an underwater robot deployed both in a laboratory pool and the Trondheim Fjord, the method takes less than 5ms for inference either on the CPU or the GPU of an NVIDIA Orin AGX and demonstrates less than 4% relative position error in novel trajectories during complete visual blackout, and approximately 2% relative error when a maximum of 2 visual features from a monocular camera are available.

DeepVL: Dynamics and Inertial Measurements-based Deep Velocity Learning for Underwater Odometry

TL;DR

DeepVL addresses the challenge of persistent underwater odometry under exteroceptive limitations by learning robot-centric velocity and uncertainty from proprioceptive cues (IMU, motor commands, battery) using a GRU-based ensemble. The velocity predictions and covariances are fused into an EKF, coupled with barometer depth and optional visual features, to achieve reliable long-term odometry and improved VIO robustness with scarce features. The approach uses a lightweight network (~28k parameters) with efficient inference (<5 ms on Orin AGX) and demonstrates strong performance across pool and fjord trials, including scenarios with visual blackout and monocular operation. The work validates the benefit of combining dynamics-aware proprioception with uncertainty-aware learning for resilient underwater localization and control.

Abstract

This paper presents a learned model to predict the robot-centric velocity of an underwater robot through dynamics-aware proprioception. The method exploits a recurrent neural network using as inputs inertial cues, motor commands, and battery voltage readings alongside the hidden state of the previous time-step to output robust velocity estimates and their associated uncertainty. An ensemble of networks is utilized to enhance the velocity and uncertainty predictions. Fusing the network's outputs into an Extended Kalman Filter, alongside inertial predictions and barometer updates, the method enables long-term underwater odometry without further exteroception. Furthermore, when integrated into visual-inertial odometry, the method assists in enhanced estimation resilience when dealing with an order of magnitude fewer total features tracked (as few as 1) as compared to conventional visual-inertial systems. Tested onboard an underwater robot deployed both in a laboratory pool and the Trondheim Fjord, the method takes less than 5ms for inference either on the CPU or the GPU of an NVIDIA Orin AGX and demonstrates less than 4% relative position error in novel trajectories during complete visual blackout, and approximately 2% relative error when a maximum of 2 visual features from a monocular camera are available.

Paper Structure

This paper contains 23 sections, 10 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Utilized custom underwater robot with 5 camera visual-inertial sensing alongside overlay of estimated trajectory from the proposed method and ground-truth vio on a top-down view of a pier in the Trondheim Fjord.
  • Figure 2: DeepVL method overview with imu, motor commands and battery voltage as proprioceptive inputs to the ensemble of recurrent neural networks. The output velocity and covariance alongside the relative barometric depth measurement are then used in an ekf for robot state estimation.
  • Figure 3: Detailed analysis of trajectory $5$ collected in the Trondheim Fjord. a) The odometry estimate with deepvl, vio with $1$ feature, and fusion of deepvl with vio with $1$ feature. b) Images from the Alphasense camera stream from multiple locations in the trajectory. c) Tabular comparison of rpe with maximum features ranging from $0$ to $8$ ('X' indicating divergence, while '-' indicates that a test is not ran if not meaningful). On the right, the evolution of position, accelerometer biases, the uncertainty estimates and rpe are shown.
  • Figure 4: A collective plot showcasing all the $8$ evaluation trajectories along with odometry estimates based on deepvl.
  • Figure 5: Aggregate rpe over all trajectories with monocular (left) and stereo (right) camera to analyze the effect of incrementally increasing the number of maximum features used in vio with and without the integration of deepvl.
  • ...and 2 more figures