Solving Short-Term Relocalization Problems In Monocular Keyframe Visual SLAM Using Spatial And Semantic Data
Azmyin Md. Kamal, Nenyi K. N. Dadson, Donovan Gegg, Corina Barbalata
TL;DR
This work tackles the challenge of short-term relocalization in monocular MKVSLAM by introducing a Pose Semantic Descriptor (PSD) that fuses semantic object information with camera pose, and a three-stage Pose-Class-Box (PCB) keyframe place recognition method. The PCB-KPR pipeline is integrated into ORB-SLAM3 to rapidly select candidate keyframes and solve 3D-to-2D correspondences via PnP (MLPnP), enabling fast global pose recovery in GPS-denied indoor environments. Across 18 indoor sequences, the approach outperforms BoW-based relocalization in candidate retrieval speed, reduces lost-state duration by about 50%, and maintains real-time performance, with robust results in several challenging scenes though limited by BA in ill-conditioned cases. Overall, the combination of semantic and spatial cues in a multimodal keyframe descriptor and a structured KPR filter provides a practical improvement for reliable, real-time relocalization in monocular VSLAM.
Abstract
In Monocular Keyframe Visual Simultaneous Localization and Mapping (MKVSLAM) frameworks, when incremental position tracking fails, global pose has to be recovered in a short-time window, also known as short-term relocalization. This capability is crucial for mobile robots to have reliable navigation, build accurate maps, and have precise behaviors around human collaborators. This paper focuses on the development of robust short-term relocalization capabilities for mobile robots using a monocular camera system. A novel multimodal keyframe descriptor is introduced, that contains semantic information of objects detected in the environment and the spatial information of the camera. Using this descriptor, a new Keyframe-based Place Recognition (KPR) method is proposed that is formulated as a multi-stage keyframe filtering algorithm, leading to a new relocalization pipeline for MKVSLAM systems. The proposed approach is evaluated over several indoor GPS denied datasets and demonstrates accurate pose recovery, in comparison to a bag-of-words approach.
