Table of Contents
Fetching ...

GeVI-SLAM: Gravity-Enhanced Stereo Visua Inertial SLAM for Underwater Robots

Yuan Shen, Yuze Hong, Guangyang Zeng, Tengfei Zhang, Pui Yi Chui, Ziyang Hong, Junfeng Wu

TL;DR

GeVI-SLAM addresses underwater VI SLAM challenges by exploiting gravity priors to decouple yaw from translation, reducing the pose estimation to a $4$-DOF problem solved with a bias-eliminated, $\sqrt{n}$-consistent estimator. A $3$-point minimal-set RANSAC enables fast and robust outlier rejection, while adaptive fusion of the gravity prior with IMU measurements yields drift-free roll and pitch and accurate $6$-DOF refinement. Stereo initialization provides scale observability and gravity alignment, and extensive experiments show improved ATE and RPE over state-of-the-art baselines in both synthetic and real underwater data. The work enables robust, real-time underwater navigation with strong resilience to texture degeneracy and low accelerations, albeit without loop closure or refractive modeling in the current form.

Abstract

Accurate visual inertial simultaneous localization and mapping (VI SLAM) for underwater robots remains a significant challenge due to frequent visual degeneracy and insufficient inertial measurement unit (IMU) motion excitation. In this paper, we present GeVI-SLAM, a gravity-enhanced stereo VI SLAM system designed to address these issues. By leveraging the stereo camera's direct depth estimation ability, we eliminate the need to estimate scale during IMU initialization, enabling stable operation even under low acceleration dynamics. With precise gravity initialization, we decouple the pitch and roll from the pose estimation and solve a 4 degrees of freedom (DOF) Perspective-n-Point (PnP) problem for pose tracking. This allows the use of a minimal 3-point solver, which significantly reduces computational time to reject outliers within a Random Sample Consensus framework. We further propose a bias-eliminated 4-DOF PnP estimator with provable consistency, ensuring the relative pose converges to the true value as the feature number increases. To handle dynamic motion, we refine the full 6-DOF pose while jointly estimating the IMU covariance, enabling adaptive weighting of the gravity prior. Extensive experiments on simulated and real-world data demonstrate that GeVI-SLAM achieves higher accuracy and greater stability compared to state-of-the-art methods.

GeVI-SLAM: Gravity-Enhanced Stereo Visua Inertial SLAM for Underwater Robots

TL;DR

GeVI-SLAM addresses underwater VI SLAM challenges by exploiting gravity priors to decouple yaw from translation, reducing the pose estimation to a -DOF problem solved with a bias-eliminated, -consistent estimator. A -point minimal-set RANSAC enables fast and robust outlier rejection, while adaptive fusion of the gravity prior with IMU measurements yields drift-free roll and pitch and accurate -DOF refinement. Stereo initialization provides scale observability and gravity alignment, and extensive experiments show improved ATE and RPE over state-of-the-art baselines in both synthetic and real underwater data. The work enables robust, real-time underwater navigation with strong resilience to texture degeneracy and low accelerations, albeit without loop closure or refractive modeling in the current form.

Abstract

Accurate visual inertial simultaneous localization and mapping (VI SLAM) for underwater robots remains a significant challenge due to frequent visual degeneracy and insufficient inertial measurement unit (IMU) motion excitation. In this paper, we present GeVI-SLAM, a gravity-enhanced stereo VI SLAM system designed to address these issues. By leveraging the stereo camera's direct depth estimation ability, we eliminate the need to estimate scale during IMU initialization, enabling stable operation even under low acceleration dynamics. With precise gravity initialization, we decouple the pitch and roll from the pose estimation and solve a 4 degrees of freedom (DOF) Perspective-n-Point (PnP) problem for pose tracking. This allows the use of a minimal 3-point solver, which significantly reduces computational time to reject outliers within a Random Sample Consensus framework. We further propose a bias-eliminated 4-DOF PnP estimator with provable consistency, ensuring the relative pose converges to the true value as the feature number increases. To handle dynamic motion, we refine the full 6-DOF pose while jointly estimating the IMU covariance, enabling adaptive weighting of the gravity prior. Extensive experiments on simulated and real-world data demonstrate that GeVI-SLAM achieves higher accuracy and greater stability compared to state-of-the-art methods.

Paper Structure

This paper contains 24 sections, 3 theorems, 13 equations, 6 figures, 1 table.

Key Result

Theorem 1

The BE estimator $\hat{\bf x}^\mathrm{BE}$ is a $\sqrt{n}$-consistent estimator for the true state vector ${\bf x}^o$.

Figures (6)

  • Figure 1: Estimated trajectory (blue line) using the proposed GeVI-SLAM system, visualized on a dense 3D reconstruction. It demonstrates our algorithm's robustness against common underwater challenges: (a) feature-sparse, coplanar scenes and (b) highly repetitive patterns.
  • Figure 2: 4-DOF pose estimation based on point-to-epipolar-line distances.
  • Figure 3: System architecture of the proposed stereo-inertial odometry framework.
  • Figure 4: (a)-(b): Our 4-DOF PnP estimator achieves CRLB accuracy, outperforming other PnP methods. (c): Our 4-DOF minimal-set consensus filter achieves better accuracy at a lower computational cost than the 5-point method. (d): Our adaptive fusion framework maintains drift-free roll and pitch estimates.
  • Figure 5: Illustration of feature matching: ORB-SLAM3 (ORB) vs. SVIN2 (BRISK). The green denotes inliers, while the yellow indicates outliers.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Big $O_p$
  • Theorem 1
  • proof
  • Lemma 1
  • Lemma 2: mu2017globally