Light-SLAM: A Robust Deep-Learning Visual SLAM System Based on LightGlue under Challenging Lighting Conditions
Zhiqi Zhao, Chang Wu, Xiaotong Kong, Zejie Lv, Xiaoqi Du, Qiyan Li
TL;DR
Light-SLAM presents a hybrid visual SLAM system that integrates LightGlue-based deep local descriptors with traditional geometry to improve robustness under challenging lighting. It replaces hand-crafted features with deep local features, employs an attention-based matching pipeline, and uses an optimized parallel image pyramid plus a stereo depth module to maintain real-time performance on GPU. Across KITTI, EuRoC, TUM, 4Season, and real campus datasets, Light-SLAM consistently outperforms traditional ORB-SLAM2 and several deep-learning–only baselines, especially in low-light and high-contrast conditions. The results indicate meaningful improvements in accuracy and robustness, with practical impact for autonomous systems operating under variable illumination.
Abstract
Simultaneous Localization and Mapping (SLAM) has become a critical technology for intelligent transportation systems and autonomous robots and is widely used in autonomous driving. However, traditional manual feature-based methods in challenging lighting environments make it difficult to ensure robustness and accuracy. Some deep learning-based methods show potential but still have significant drawbacks. To address this problem, we propose a novel hybrid system for visual SLAM based on the LightGlue deep learning network. It uses deep local feature descriptors to replace traditional hand-crafted features and a more efficient and accurate deep network to achieve fast and precise feature matching. Thus, we use the robustness of deep learning to improve the whole system. We have combined traditional geometry-based approaches to introduce a complete visual SLAM system for monocular, binocular, and RGB-D sensors. We thoroughly tested the proposed system on four public datasets: KITTI, EuRoC, TUM, and 4Season, as well as on actual campus scenes. The experimental results show that the proposed method exhibits better accuracy and robustness in adapting to low-light and strongly light-varying environments than traditional manual features and deep learning-based methods. It can also run on GPU in real time.
