SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions

Hongkun Luo; Yang Liu; Chi Guo; Zengke Li; Weiwei Song

SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions

Hongkun Luo, Yang Liu, Chi Guo, Zengke Li, Weiwei Song

TL;DR

Experimental validation on the well-known EuRoC and UMA-VI datasets demonstrates that SuperVINS achieves comparable accuracy and robustness to other state-of-the-art VI-SLAM systems, particularly in the most challenging sequences.

Abstract

The traditional visual-inertial SLAM system often struggles with stability under low-light or motion-blur conditions, leading to potential lost of trajectory tracking. High accuracy and robustness are essential for the long-term and stable localization capabilities of SLAM systems. Addressing the challenges of enhancing robustness and accuracy in visual-inertial SLAM, this paper propose SuperVINS, a real-time visual-inertial SLAM framework designed for challenging imaging conditions. In contrast to geometric modeling, deep learning features are capable of fully leveraging the implicit information present in images, which is often not captured by geometric features. Therefore, SuperVINS, developed as an enhancement of VINS-Fusion, integrates the deep learning neural network model SuperPoint for feature point extraction and loop closure detection. At the same time, a deep learning neural network LightGlue model for associating feature points is integrated in front-end feature matching. A feature matching enhancement strategy based on the RANSAC algorithm is proposed. The system is allowed to set different masks and RANSAC thresholds for various environments, thereby balancing computational cost and localization accuracy. Additionally, it allows for flexible training of specific SuperPoint bag of words tailored for loop closure detection in particular environments. The system enables real-time localization and mapping. Experimental validation on the well-known EuRoC dataset demonstrates that SuperVINS is comparable to other visual-inertial SLAM system in accuracy and robustness across the most challenging sequences. This paper analyzes the advantages of SuperVINS in terms of accuracy, real-time performance, and robustness. To facilitate knowledge exchange within the field, we have made the code for this paper publicly available.

SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions

TL;DR

Abstract

Paper Structure (14 sections, 5 equations, 13 figures, 2 tables)

This paper contains 14 sections, 5 equations, 13 figures, 2 tables.

Introduction
Related Work
Traditional visual-inertial SLAM
Feature Extraction and Matching Method Based on Deep Learning
Visual-inertial SLAM based on deep learning
System Overview
Method
SuperPoint Feature Extraction and LightGlue Feature Matching
Matching Enhancement Strategy
Loop Closure Detection with Deep Learning
Experiments
Accuracy Testing of SuperVINS
Real-time and Robustness Testing of SuperVINS
Conclusion

Figures (13)

Figure 1: System Introduction
Figure 2: Schematic of deep learning descriptor transfer.
Figure 3: Overview of the SuperVINS framework: The system integrates camera and IMU data as input. It employs SuperPoint and LightGlue to match features between consecutive image frames while performing pre-integration. After the LightGlue matching process, SuperVINS utilizes the RANSAC algorithm to enhance the accuracy of feature correspondences. Upon completion of front-end optimization, the extracted features are synchronously transmitted to the node responsible for loop closure detection. SuperVINS constructs keyframes and conducts sliding window-based pose optimization. After pose calculations, the system relays the features, pose, and point cloud maps of the keyframes to the loop closure detection node, which employs DBoW3 for feature retrieval and pose graph optimization.
Figure 4: Comparison of Traditional Geometric Feature Points and Deep Learning Feature Points in Visual Inertial SLAM.
Figure 5: Matching Effect Comparison.The figure illustrates three methods of feature extraction and association for the same pair of images. It is evident that utilizing deep learning for feature extraction and matching yields superior results.
...and 8 more figures

SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions

TL;DR

Abstract

SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions

Authors

TL;DR

Abstract

Table of Contents

Figures (13)