Table of Contents
Fetching ...

PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization

Jiaming He, Mingrui Li, Yangyang Wang, Hongyu Wang

TL;DR

This paper tackles robust visual-inertial SLAM in challenging environments by fusing point and line features and by an efficient IMU initialization that separates gyroscope bias estimation from accelerometer bias and gravity. The system architecture integrates parallel feature extraction, line fusion, DNN-based dynamic feature elimination, and CNN/GraphNet-assisted loop closure, with all modules accelerated for real-time use. A rotation-only gyroscope-bias estimation coupled with an analytical accelerometer bias and gravity solution enables fast, reliable initialization, followed by MAP refinement in the back-end. Experimental results on EuRoC, OpenLORIS-Scene, and TUM-VI demonstrate state-of-the-art localization accuracy and robust loop-closure performance in dynamic and texture-poor scenarios.

Abstract

Visual-inertial SLAM is crucial in various fields, such as aerial vehicles, industrial robots, and autonomous driving. The fusion of camera and inertial measurement unit (IMU) makes up for the shortcomings of a signal sensor, which significantly improves the accuracy and robustness of localization in challenging environments. This article presents PLE-SLAM, an accurate and real-time visual-inertial SLAM algorithm based on point-line features and efficient IMU initialization. First, we use parallel computing methods to extract features and compute descriptors to ensure real-time performance. Adjacent short line segments are merged into long line segments, and isolated short line segments are directly deleted. Second, a rotation-translation-decoupled initialization method is extended to use both points and lines. Gyroscope bias is optimized by tightly coupling IMU measurements and image observations. Accelerometer bias and gravity direction are solved by an analytical method for efficiency. To improve the system's intelligence in handling complex environments, a scheme of leveraging semantic information and geometric constraints to eliminate dynamic features and A solution for loop detection and closed-loop frame pose estimation using CNN and GNN are integrated into the system. All networks are accelerated to ensure real-time performance. The experiment results on public datasets illustrate that PLE-SLAM is one of the state-of-the-art visual-inertial SLAM systems.

PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization

TL;DR

This paper tackles robust visual-inertial SLAM in challenging environments by fusing point and line features and by an efficient IMU initialization that separates gyroscope bias estimation from accelerometer bias and gravity. The system architecture integrates parallel feature extraction, line fusion, DNN-based dynamic feature elimination, and CNN/GraphNet-assisted loop closure, with all modules accelerated for real-time use. A rotation-only gyroscope-bias estimation coupled with an analytical accelerometer bias and gravity solution enables fast, reliable initialization, followed by MAP refinement in the back-end. Experimental results on EuRoC, OpenLORIS-Scene, and TUM-VI demonstrate state-of-the-art localization accuracy and robust loop-closure performance in dynamic and texture-poor scenarios.

Abstract

Visual-inertial SLAM is crucial in various fields, such as aerial vehicles, industrial robots, and autonomous driving. The fusion of camera and inertial measurement unit (IMU) makes up for the shortcomings of a signal sensor, which significantly improves the accuracy and robustness of localization in challenging environments. This article presents PLE-SLAM, an accurate and real-time visual-inertial SLAM algorithm based on point-line features and efficient IMU initialization. First, we use parallel computing methods to extract features and compute descriptors to ensure real-time performance. Adjacent short line segments are merged into long line segments, and isolated short line segments are directly deleted. Second, a rotation-translation-decoupled initialization method is extended to use both points and lines. Gyroscope bias is optimized by tightly coupling IMU measurements and image observations. Accelerometer bias and gravity direction are solved by an analytical method for efficiency. To improve the system's intelligence in handling complex environments, a scheme of leveraging semantic information and geometric constraints to eliminate dynamic features and A solution for loop detection and closed-loop frame pose estimation using CNN and GNN are integrated into the system. All networks are accelerated to ensure real-time performance. The experiment results on public datasets illustrate that PLE-SLAM is one of the state-of-the-art visual-inertial SLAM systems.
Paper Structure (26 sections, 23 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 23 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: The framework of the proposed system. We extract point and line segment features from stereo images. The 2D feature observations and rotation pre-integration are combined to estimate gyroscope bias by a rotation-only solution. Accelerometer bias and gravity direction are solved by an analytical solution, which makes the initialization process faster than iterative methods. In back-end, DNN-based features and matching methods are used to detect loop candidate frames and estimate the relative pose of loop frames. To improve the robustness to dynamic environments, a dynamic feature elimination thread is performed in parallel with tracking thread by combining semantic information and geometric constraints. The dense point cloud map in the figure is constructed using the left camera keyframe image and the depth map obtained by semi-global block matching.
  • Figure 2: Line segments extracted by EDLines (a) and by our proposed method (b). In order to make it easier to observe the effect, different line segments are drawn in different colors. Obviously, our method effectively merges and filters short line segments and obtains longer line segments that are more stable to track.
  • Figure 3: Per-sequence testing results with the OpenLORIS-Scene datasets. Each black dot on the top line represents the start of one data sequence. For each algorithm, blue dots indicate successful initialization, and blue lines indicate successful tracking. The percentage value on the top left of each scene is average correct rate, larger means more robust. The float value in the first line on the bottom right is average ATE RMSE and the two values in the second line from left to right are average T.RPE and average R.RPE respectively, smaller means more accurate. Parts with errors that are larger than excessive are directly ignored during the entire evaluation process and their corresponding correct rate parts are eliminated.
  • Figure 4: Trajectory comparison results on corridor1-3. Green line denotes the grountruth and red line denotes the estimated trajectory. From left to right in the figure are ORB-SLAM3, Dynamic-VINS, DGM-VINS and PLE-SLAM.