Table of Contents
Fetching ...

AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System

Kuan Xu, Yuefan Hao, Shenghai Yuan, Chen Wang, Lihua Xie

TL;DR

AirSLAM addresses the dual challenges of short-term and long-term illumination in visual SLAM by marrying a unified point-line detector (PLNet) with a hybrid front-end and back-end. It introduces a fast, GPU-accelerated pipeline that jointly detects points and lines, matches them with LightGlue, and triangulates into a point-line map, while a multi-stage relocalization module enables drift-free map reuse under varying lighting. Offline map optimization (loop closure, map merging, global bundle adjustment, and a scene-dependent junction vocabulary) yields a refined map that supports robust online relocalization. Empirical results show AirSLAM achieving strong accuracy, high efficiency (up to 73 FPS on PC and 40 FPS on embedded), and superior illumination robustness across multiple datasets, with open-source availability.

Abstract

In this paper, we present an efficient visual SLAM system designed to tackle both short-term and long-term illumination challenges. Our system adopts a hybrid approach that combines deep learning techniques for feature detection and matching with traditional backend optimization methods. Specifically, we propose a unified convolutional neural network (CNN) that simultaneously extracts keypoints and structural lines. These features are then associated, matched, triangulated, and optimized in a coupled manner. Additionally, we introduce a lightweight relocalization pipeline that reuses the built map, where keypoints, lines, and a structure graph are used to match the query frame with the map. To enhance the applicability of the proposed system to real-world robots, we deploy and accelerate the feature detection and matching networks using C++ and NVIDIA TensorRT. Extensive experiments conducted on various datasets demonstrate that our system outperforms other state-of-the-art visual SLAM systems in illumination-challenging environments. Efficiency evaluations show that our system can run at a rate of 73Hz on a PC and 40Hz on an embedded platform. Our implementation is open-sourced: https://github.com/sair-lab/AirSLAM.

AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System

TL;DR

AirSLAM addresses the dual challenges of short-term and long-term illumination in visual SLAM by marrying a unified point-line detector (PLNet) with a hybrid front-end and back-end. It introduces a fast, GPU-accelerated pipeline that jointly detects points and lines, matches them with LightGlue, and triangulates into a point-line map, while a multi-stage relocalization module enables drift-free map reuse under varying lighting. Offline map optimization (loop closure, map merging, global bundle adjustment, and a scene-dependent junction vocabulary) yields a refined map that supports robust online relocalization. Empirical results show AirSLAM achieving strong accuracy, high efficiency (up to 73 FPS on PC and 40 FPS on embedded), and superior illumination robustness across multiple datasets, with open-source availability.

Abstract

In this paper, we present an efficient visual SLAM system designed to tackle both short-term and long-term illumination challenges. Our system adopts a hybrid approach that combines deep learning techniques for feature detection and matching with traditional backend optimization methods. Specifically, we propose a unified convolutional neural network (CNN) that simultaneously extracts keypoints and structural lines. These features are then associated, matched, triangulated, and optimized in a coupled manner. Additionally, we introduce a lightweight relocalization pipeline that reuses the built map, where keypoints, lines, and a structure graph are used to match the query frame with the map. To enhance the applicability of the proposed system to real-world robots, we deploy and accelerate the feature detection and matching networks using C++ and NVIDIA TensorRT. Extensive experiments conducted on various datasets demonstrate that our system outperforms other state-of-the-art visual SLAM systems in illumination-challenging environments. Efficiency evaluations show that our system can run at a rate of 73Hz on a PC and 40Hz on an embedded platform. Our implementation is open-sourced: https://github.com/sair-lab/AirSLAM.
Paper Structure (62 sections, 22 equations, 15 figures, 9 tables)

This paper contains 62 sections, 22 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: The proposed system consists of three main parts: online stereo VO/VIO, offline map optimization, and online relocalization. The VO/VIO module uses the mapping image sequences to build an initial map. Then the initial map is processed offline and an optimized map is outputted. The optimized map can be used for the one-shot relocalization.
  • Figure 2: We visualize the feature map (top right) and detected keypoints (bottom left) of a keypoint detection model, and the detected structural lines (bottom right) of a line detection model. The overlap of keypoints and junctions, and the edge information in the feature map inspire the design of our PLNet.
  • Figure 3: The framework of the proposed PLNet. It consists of the shared backbone, the keypoint module, and the line module.
  • Figure 4: We use four parameters, $d, \theta, \theta_1, \theta_2$, to encode a line into a point $\mathbf{p}$ within the attraction region field.
  • Figure 5: We use seven types of photometric data augmentation to train our PLNet to make it more robust to challenging illumination.
  • ...and 10 more figures