Table of Contents
Fetching ...

Advancements in Translation Accuracy for Stereo Visual-Inertial Initialization

Han Song, Zhongche Qu, Zhi Zhang, Zihan Ye, Cong Liu

TL;DR

This paper tackles initialization accuracy in stereo VI-SLAM, where ORB-SLAM3's dependence on pure visual trajectory estimation and Stereo-NEC's costly gyroscope bias estimation limit robustness. It introduces ETA, a Rotation-Translation-Decoupled initialization pipeline that uses a 3-DoF bundle adjustment for translation and IMU-driven rotation updates, followed by a joint visual-inertial MAP refinement. The method runs in four stages: pure stereo SLAM, inertial-only MAP, rotation-translation decoupling with 3-DoF BA, and joint VI-BA, achieving higher translation accuracy without sacrificing runtime. Evaluations on EuRoC show state-of-the-art performance, particularly in challenging sequences like V2_03_difficult, confirming robustness and practical relevance for VI-SLAM initialization.

Abstract

As the current initialization method in the state-of-the-art Stereo Visual-Inertial SLAM framework, ORB-SLAM3 has limitations. Its success depends on the performance of the pure stereo SLAM system and is based on the underlying assumption that pure visual SLAM can accurately estimate the camera trajectory, which is essential for inertial parameter estimation. Meanwhile, the further improved initialization method for ORB-SLAM3, known as Stereo-NEC, is time-consuming due to applying keypoint tracking to estimate gyroscope bias with normal epipolar constraints. To address the limitations of previous methods, this paper proposes a method aimed at enhancing translation accuracy during the initialization stage. The fundamental concept of our method is to improve the translation estimate with a 3 Degree-of-Freedom (DoF) Bundle Adjustment (BA), independently, while the rotation estimate is fixed, instead of using ORB-SLAM3's 6-DoF BA. Additionally, the rotation estimate will be updated by considering IMU measurements and gyroscope bias, unlike ORB-SLAM3's rotation, which is directly obtained from stereo visual odometry and may yield inferior results when operating in challenging scenarios. We also conduct extensive evaluations on the public benchmark, the EuRoC dataset, demonstrating that our method excels in accuracy.

Advancements in Translation Accuracy for Stereo Visual-Inertial Initialization

TL;DR

This paper tackles initialization accuracy in stereo VI-SLAM, where ORB-SLAM3's dependence on pure visual trajectory estimation and Stereo-NEC's costly gyroscope bias estimation limit robustness. It introduces ETA, a Rotation-Translation-Decoupled initialization pipeline that uses a 3-DoF bundle adjustment for translation and IMU-driven rotation updates, followed by a joint visual-inertial MAP refinement. The method runs in four stages: pure stereo SLAM, inertial-only MAP, rotation-translation decoupling with 3-DoF BA, and joint VI-BA, achieving higher translation accuracy without sacrificing runtime. Evaluations on EuRoC show state-of-the-art performance, particularly in challenging sequences like V2_03_difficult, confirming robustness and practical relevance for VI-SLAM initialization.

Abstract

As the current initialization method in the state-of-the-art Stereo Visual-Inertial SLAM framework, ORB-SLAM3 has limitations. Its success depends on the performance of the pure stereo SLAM system and is based on the underlying assumption that pure visual SLAM can accurately estimate the camera trajectory, which is essential for inertial parameter estimation. Meanwhile, the further improved initialization method for ORB-SLAM3, known as Stereo-NEC, is time-consuming due to applying keypoint tracking to estimate gyroscope bias with normal epipolar constraints. To address the limitations of previous methods, this paper proposes a method aimed at enhancing translation accuracy during the initialization stage. The fundamental concept of our method is to improve the translation estimate with a 3 Degree-of-Freedom (DoF) Bundle Adjustment (BA), independently, while the rotation estimate is fixed, instead of using ORB-SLAM3's 6-DoF BA. Additionally, the rotation estimate will be updated by considering IMU measurements and gyroscope bias, unlike ORB-SLAM3's rotation, which is directly obtained from stereo visual odometry and may yield inferior results when operating in challenging scenarios. We also conduct extensive evaluations on the public benchmark, the EuRoC dataset, demonstrating that our method excels in accuracy.
Paper Structure (14 sections, 6 equations, 2 figures, 2 tables)

This paper contains 14 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: A diagram of the proposed pipeline. Initially, it runs pure stereo SLAM, followed by using an inertial-only optimizer to estimate inertial parameters. Next, it employs an efficient 3-DoF BA to enhance the translation estimates. Finally, it optimizes the entire set of visual and inertial parameters jointly through joint Visual-Inertial BA.
  • Figure 2: An Asctec Firefly hex-rotor aerial drone was used, equipped with a visual-inertial sensor unit comprising a camera and an IMU.