Table of Contents
Fetching ...

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

Weihan Wang, Chieh Chou, Ganesh Sevagamoorthy, Kevin Chen, Zheng Chen, Ziyue Feng, Youjie Xia, Feiyang Cai, Yi Xu, Philippos Mordohai

TL;DR

Stereo-NEC tackles stereo VI-SLAM initialization by first estimating the gyroscope bias to improve rotation accuracy, then refining acceleration and gravity through a MAP formulation. It extends the monocular normal epipolar constraint to stereo observations, enabling an eigenvalue-based bias estimator that informs a bias-corrected rotation followed by a 3-DoF translation optimization, and culminates with a joint VI-BA. The key contributions are the eigenvalue-based gyroscope bias estimator, the rotation-translation-decoupled optimization, and the novel initialization-success criterion based on the normal epipolar residual. Evaluations on EuRoC show superior ATE and RRE compared to ORB-SLAM3, with robust performance at reduced keyframes and competitive computation speed, underscoring practical impact for robust stereo VI-SLAM initialization.

Abstract

We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of precise gyroscope bias estimation on rotation accuracy. This, in turn, affects trajectory accuracy due to the accumulation of translation errors. To address this, we first independently estimate the gyroscope bias and use it to formulate a maximum a posteriori problem for further refinement. After this refinement, we proceed to update the rotation estimation by performing IMU integration with gyroscope bias removed from gyroscope measurements. We then leverage robust and accurate rotation estimates to enhance translation estimation via 3-DoF bundle adjustment. Moreover, we introduce a novel approach for determining the success of the initialization by evaluating the residual of the normal epipolar constraint. Extensive evaluations on the EuRoC dataset illustrate that our method excels in accuracy and robustness. It outperforms ORB-SLAM3, the current leading stereo visual-inertial initialization method, in terms of absolute trajectory error and relative rotation error, while maintaining competitive computational speed. Notably, even with 5 keyframes for initialization, our method consistently surpasses the state-of-the-art approach using 10 keyframes in rotation accuracy.

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

TL;DR

Stereo-NEC tackles stereo VI-SLAM initialization by first estimating the gyroscope bias to improve rotation accuracy, then refining acceleration and gravity through a MAP formulation. It extends the monocular normal epipolar constraint to stereo observations, enabling an eigenvalue-based bias estimator that informs a bias-corrected rotation followed by a 3-DoF translation optimization, and culminates with a joint VI-BA. The key contributions are the eigenvalue-based gyroscope bias estimator, the rotation-translation-decoupled optimization, and the novel initialization-success criterion based on the normal epipolar residual. Evaluations on EuRoC show superior ATE and RRE compared to ORB-SLAM3, with robust performance at reduced keyframes and competitive computation speed, underscoring practical impact for robust stereo VI-SLAM initialization.

Abstract

We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of precise gyroscope bias estimation on rotation accuracy. This, in turn, affects trajectory accuracy due to the accumulation of translation errors. To address this, we first independently estimate the gyroscope bias and use it to formulate a maximum a posteriori problem for further refinement. After this refinement, we proceed to update the rotation estimation by performing IMU integration with gyroscope bias removed from gyroscope measurements. We then leverage robust and accurate rotation estimates to enhance translation estimation via 3-DoF bundle adjustment. Moreover, we introduce a novel approach for determining the success of the initialization by evaluating the residual of the normal epipolar constraint. Extensive evaluations on the EuRoC dataset illustrate that our method excels in accuracy and robustness. It outperforms ORB-SLAM3, the current leading stereo visual-inertial initialization method, in terms of absolute trajectory error and relative rotation error, while maintaining competitive computational speed. Notably, even with 5 keyframes for initialization, our method consistently surpasses the state-of-the-art approach using 10 keyframes in rotation accuracy.
Paper Structure (17 sections, 10 equations, 4 figures, 3 tables)

This paper contains 17 sections, 10 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: 3D points ($\textbf{p}_i$) are represented with different colored five-pointed stars. For each pair of bearing vectors, an epipolar plane is formed (highlighted in green, blue, yellow), and their corresponding normal vectors ($\textbf{n}_i$) are shown in corresponding colors. The normal vectors lie in a same plane. The baseline $\textbf{t}^{ \text{c}_\text{k}}_{\text{c}_\text{k+1}}$ intersects all corresponding epipolar planes.
  • Figure 2: An illustration depicting the geometry of the stereo normal epipolar constraint is shown. The blue and red dashed triangles represent the left and right cameras, respectively. Left $O_\text{k}$ and Left $O_\text{k+1}$ represent the two left optical centers at time $\text{k}$ and $\text{k+1}$ respectively. Similarly, Right $O_\text{k}$ and Right $O_\text{k+1}$ correspond to the two right optical centers. The orange line represents the baseline of the stereo cameras. The temporal epipolar planes of the left camera are colored in green and yellow, while the temporal epipolar planes of the right camera are colored in blue and purple. Each corresponding normal vector ($\textbf{n}_i$) of each temporal epipolar plane is depicted in corresponding colors. Only normal vectors from the same camera are coplanar.
  • Figure 3: Step 3: Rotation update via IMU integration, followed by translation optimization using 3-DoF bundle adjustment.
  • Figure 4: Relative Rotation Error (RRE) for different methods with different number of keyframes. Left: Results with 5 and 10 keyframes without VI-BA for initialization. Right: Results with 5 and 10 keyframes with VI-BA applied for initialization.