Table of Contents
Fetching ...

Solving Short-Term Relocalization Problems In Monocular Keyframe Visual SLAM Using Spatial And Semantic Data

Azmyin Md. Kamal, Nenyi K. N. Dadson, Donovan Gegg, Corina Barbalata

TL;DR

This work tackles the challenge of short-term relocalization in monocular MKVSLAM by introducing a Pose Semantic Descriptor (PSD) that fuses semantic object information with camera pose, and a three-stage Pose-Class-Box (PCB) keyframe place recognition method. The PCB-KPR pipeline is integrated into ORB-SLAM3 to rapidly select candidate keyframes and solve 3D-to-2D correspondences via PnP (MLPnP), enabling fast global pose recovery in GPS-denied indoor environments. Across 18 indoor sequences, the approach outperforms BoW-based relocalization in candidate retrieval speed, reduces lost-state duration by about 50%, and maintains real-time performance, with robust results in several challenging scenes though limited by BA in ill-conditioned cases. Overall, the combination of semantic and spatial cues in a multimodal keyframe descriptor and a structured KPR filter provides a practical improvement for reliable, real-time relocalization in monocular VSLAM.

Abstract

In Monocular Keyframe Visual Simultaneous Localization and Mapping (MKVSLAM) frameworks, when incremental position tracking fails, global pose has to be recovered in a short-time window, also known as short-term relocalization. This capability is crucial for mobile robots to have reliable navigation, build accurate maps, and have precise behaviors around human collaborators. This paper focuses on the development of robust short-term relocalization capabilities for mobile robots using a monocular camera system. A novel multimodal keyframe descriptor is introduced, that contains semantic information of objects detected in the environment and the spatial information of the camera. Using this descriptor, a new Keyframe-based Place Recognition (KPR) method is proposed that is formulated as a multi-stage keyframe filtering algorithm, leading to a new relocalization pipeline for MKVSLAM systems. The proposed approach is evaluated over several indoor GPS denied datasets and demonstrates accurate pose recovery, in comparison to a bag-of-words approach.

Solving Short-Term Relocalization Problems In Monocular Keyframe Visual SLAM Using Spatial And Semantic Data

TL;DR

This work tackles the challenge of short-term relocalization in monocular MKVSLAM by introducing a Pose Semantic Descriptor (PSD) that fuses semantic object information with camera pose, and a three-stage Pose-Class-Box (PCB) keyframe place recognition method. The PCB-KPR pipeline is integrated into ORB-SLAM3 to rapidly select candidate keyframes and solve 3D-to-2D correspondences via PnP (MLPnP), enabling fast global pose recovery in GPS-denied indoor environments. Across 18 indoor sequences, the approach outperforms BoW-based relocalization in candidate retrieval speed, reduces lost-state duration by about 50%, and maintains real-time performance, with robust results in several challenging scenes though limited by BA in ill-conditioned cases. Overall, the combination of semantic and spatial cues in a multimodal keyframe descriptor and a structured KPR filter provides a practical improvement for reliable, real-time relocalization in monocular VSLAM.

Abstract

In Monocular Keyframe Visual Simultaneous Localization and Mapping (MKVSLAM) frameworks, when incremental position tracking fails, global pose has to be recovered in a short-time window, also known as short-term relocalization. This capability is crucial for mobile robots to have reliable navigation, build accurate maps, and have precise behaviors around human collaborators. This paper focuses on the development of robust short-term relocalization capabilities for mobile robots using a monocular camera system. A novel multimodal keyframe descriptor is introduced, that contains semantic information of objects detected in the environment and the spatial information of the camera. Using this descriptor, a new Keyframe-based Place Recognition (KPR) method is proposed that is formulated as a multi-stage keyframe filtering algorithm, leading to a new relocalization pipeline for MKVSLAM systems. The proposed approach is evaluated over several indoor GPS denied datasets and demonstrates accurate pose recovery, in comparison to a bag-of-words approach.
Paper Structure (13 sections, 11 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 11 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: High level system overview: For each keyframe (colored rectangles) the proposed multimodal descriptor is formed using semantic and spatial data. When tracking is lost in the red keyframe, the proposed KPR method selects a number of keyframes from the pose graph. By solving a series of correspondences between 3D map points (in a candidate keyframe) to 2D keypoints (in query keyframe), an estimation of the global pose is found. Followed by a pose graph optimization step, global pose is recovered in the yellow keyframe and incremental tracking resumes in green keyframes.
  • Figure 2: Overview of the full framework: An image is passed through the purple block where the preprocess block is resizing the image, that is next fed to the YOLOv5 object detector yolov5. Next the postprocess block performs non-maximal suppression neubeck2006efficient on the detection to finalize class predictions for the detected objects. In light blue the MKVSLAM module is presented. The path in green shows the active tracking state. The path in orange shows a relocalization attempt in a lost state. The path in red indicates a full recovery failure. The relocalization pipeline identified by the dashed deep blue rectangle consists of computing the PSD descriptor of the query keyframe, choosing optimal candidates using the proposed KPR method and recovering global camera position by solving a series of PnP problems.
  • Figure 3: Application of proposed KPR method in a 2D case. The top row shows the collection of objects seen by the query keyframe and some of the candidate keyframes. Colored triangles in active map represents keyframes, with red being identified those that aren't chosen by the proposed approach. Keyframes that are selected in each of the three steps are colored in green. The shaded blue circle is the search sphere in the pose constraint. In the class constraint only keyframes 21 and 31 are selected as they contain the same type of objects as the query keyframe (identified by blue triangle). The box constraint retains only keyframe 21 as the location of the observed objects matches the query keyframe.
  • Figure 4: Execution time for the KPR methods and the full relocalization method. Deep green bar represents the time for proposed KPR, deep blue for DBoW2 KPR, light green for the proposed relocalization method and light blue for the DBoW2 relocalization method. Best viewed in color.
  • Figure 5: Trajectory estimates with ORB-SLAM3 containing the modified relocalization method (blue) and DBoW2 version (red). Where available, ground-truth are shown in grey.