Table of Contents
Fetching ...

SP-VINS: A Hybrid Stereo Visual Inertial Navigation System based on Implicit Environmental Map

Xueyu Du, Lilian Zhang, Fuan Duan, Xincan Luo, Maosong Wang, Wenqi Wu, JunMao

TL;DR

SP-VINS tackles long-term drift in filter-based visual-inertial navigation by replacing a 3D map with an implicit environment map built from keyframes and 2D keypoints. It fuses a hybrid residual framework that combines landmark reprojections and ray-depth constraints within a DST-EKF, and incorporates online camera-IMU extrinsic calibration to handle degraded environments. A loop-closure module leverages the implicit map, enabling drift correction without pose-graph optimization or 3D mapping, which boosts efficiency. Benchmark results on EuRoC, TUM-VI, and KAIST-Urban show SP-VINS achieves long-term, high-accuracy localization with lower computational overhead than state-of-the-art SLAM systems, making it well-suited for resource-constrained platforms.

Abstract

Filter-based visual inertial navigation system (VINS) has attracted mobile-robot researchers for the good balance between accuracy and efficiency, but its limited mapping quality hampers long-term high-accuracy state estimation. To this end, we first propose a novel filter-based stereo VINS, differing from traditional simultaneous localization and mapping (SLAM) systems based on 3D map, which performs efficient loop closure constraints with implicit environmental map composed of keyframes and 2D keypoints. Secondly, we proposed a hybrid residual filter framework that combines landmark reprojection and ray constraints to construct a unified Jacobian matrix for measurement updates. Finally, considering the degraded environment, we incorporated the camera-IMU extrinsic parameters into visual description to achieve online calibration. Benchmark experiments demonstrate that the proposed SP-VINS achieves high computational efficiency while maintaining long-term high-accuracy localization performance, and is superior to existing state-of-the-art (SOTA) methods.

SP-VINS: A Hybrid Stereo Visual Inertial Navigation System based on Implicit Environmental Map

TL;DR

SP-VINS tackles long-term drift in filter-based visual-inertial navigation by replacing a 3D map with an implicit environment map built from keyframes and 2D keypoints. It fuses a hybrid residual framework that combines landmark reprojections and ray-depth constraints within a DST-EKF, and incorporates online camera-IMU extrinsic calibration to handle degraded environments. A loop-closure module leverages the implicit map, enabling drift correction without pose-graph optimization or 3D mapping, which boosts efficiency. Benchmark results on EuRoC, TUM-VI, and KAIST-Urban show SP-VINS achieves long-term, high-accuracy localization with lower computational overhead than state-of-the-art SLAM systems, making it well-suited for resource-constrained platforms.

Abstract

Filter-based visual inertial navigation system (VINS) has attracted mobile-robot researchers for the good balance between accuracy and efficiency, but its limited mapping quality hampers long-term high-accuracy state estimation. To this end, we first propose a novel filter-based stereo VINS, differing from traditional simultaneous localization and mapping (SLAM) systems based on 3D map, which performs efficient loop closure constraints with implicit environmental map composed of keyframes and 2D keypoints. Secondly, we proposed a hybrid residual filter framework that combines landmark reprojection and ray constraints to construct a unified Jacobian matrix for measurement updates. Finally, considering the degraded environment, we incorporated the camera-IMU extrinsic parameters into visual description to achieve online calibration. Benchmark experiments demonstrate that the proposed SP-VINS achieves high computational efficiency while maintaining long-term high-accuracy localization performance, and is superior to existing state-of-the-art (SOTA) methods.

Paper Structure

This paper contains 15 sections, 30 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) Framework of SP-VINS, which differs from traditional 3D map based SLAM systems, only performs global drift correction based on implicit environmental map composed of keyframes and 2D keypoints; (b) Comparison of SP-VINS with three SOTA methods on sequence Urban28 jeong2019complex.
  • Figure 2: Comparison among the visual residual representation of OpenVINS 9196524, PO filter-based VIO du2025spviowang2025po and SP-VINS.
  • Figure 3: (a) Geometric representation of ray-based visual residual model (as shown in Section \ref{['sec:Ray-based Visual Residual']}); (b) Geometric representation of implicit environmental map based loop closure (as shown in Section \ref{['sec:Implicit Map-based Relocalization']}).
  • Figure 4: RMSE of RPE for the comparison algorithms on KAIST-Urban.
  • Figure 5: (a)-(c) are the average runtime of comparison algorithms on different datasets (Unit: ms). Notably, the back-end runtime of OpenVINS includes enable hybrid update, while Voxel-SVIO includes the voxel-map creation and management.
  • ...and 1 more figures