DeepVL: Dynamics and Inertial Measurements-based Deep Velocity Learning for Underwater Odometry
Mohit Singh, Kostas Alexis
TL;DR
DeepVL addresses the challenge of persistent underwater odometry under exteroceptive limitations by learning robot-centric velocity and uncertainty from proprioceptive cues (IMU, motor commands, battery) using a GRU-based ensemble. The velocity predictions and covariances are fused into an EKF, coupled with barometer depth and optional visual features, to achieve reliable long-term odometry and improved VIO robustness with scarce features. The approach uses a lightweight network (~28k parameters) with efficient inference (<5 ms on Orin AGX) and demonstrates strong performance across pool and fjord trials, including scenarios with visual blackout and monocular operation. The work validates the benefit of combining dynamics-aware proprioception with uncertainty-aware learning for resilient underwater localization and control.
Abstract
This paper presents a learned model to predict the robot-centric velocity of an underwater robot through dynamics-aware proprioception. The method exploits a recurrent neural network using as inputs inertial cues, motor commands, and battery voltage readings alongside the hidden state of the previous time-step to output robust velocity estimates and their associated uncertainty. An ensemble of networks is utilized to enhance the velocity and uncertainty predictions. Fusing the network's outputs into an Extended Kalman Filter, alongside inertial predictions and barometer updates, the method enables long-term underwater odometry without further exteroception. Furthermore, when integrated into visual-inertial odometry, the method assists in enhanced estimation resilience when dealing with an order of magnitude fewer total features tracked (as few as 1) as compared to conventional visual-inertial systems. Tested onboard an underwater robot deployed both in a laboratory pool and the Trondheim Fjord, the method takes less than 5ms for inference either on the CPU or the GPU of an NVIDIA Orin AGX and demonstrates less than 4% relative position error in novel trajectories during complete visual blackout, and approximately 2% relative error when a maximum of 2 visual features from a monocular camera are available.
