Semantic Object-level Modeling for Robust Visual Camera Relocalization
Yifan Zhu, Lingjuan Miao, Haitao Wu, Zhiqiang Zhou, Weiyi Chen, Longwen Wu
TL;DR
This work tackles robust visual camera relocalization in SLAM by building semantic object-level voxel maps that yield accurate ellipsoidal representations of objects. It integrates object-level voxel modelling with an object-based relocalization strategy that exploits the projection properties of 2D ellipses and 3D ellipsoids, including a P3P-based initial pose and Wasserstein-distance refinement. The method demonstrates improved relocalization robustness and accuracy against viewpoint changes on indoor RGBD datasets, outperforming OA-SLAM and approaching ORB-SLAM2 performance when combined with point cues. The approach offers real-time performance, seamless integration with RGBD SLAM, and enhanced reliability for mobile robotics in unknown indoor environments.
Abstract
Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual relocalization is greatly enhanced especially in viewpoints where classical methods fail. However, ellipsoids (quadrics) generated by axis-aligned object detection may limit the accuracy of the object-level representation and degenerate the performance of visual relocalization system. In this paper, we propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. As for visual relocalization, we design a better pose optimization strategy for camera pose recovery, to fully utilize the projection characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of these modules are entirely intergrated into visual SLAM system. Experimental results show that our semantic object-level mapping and object-based visual relocalization methods significantly enhance the performance of visual relocalization in terms of robustness to new viewpoints.
