Table of Contents
Fetching ...

Semantic Object-level Modeling for Robust Visual Camera Relocalization

Yifan Zhu, Lingjuan Miao, Haitao Wu, Zhiqiang Zhou, Weiyi Chen, Longwen Wu

TL;DR

This work tackles robust visual camera relocalization in SLAM by building semantic object-level voxel maps that yield accurate ellipsoidal representations of objects. It integrates object-level voxel modelling with an object-based relocalization strategy that exploits the projection properties of 2D ellipses and 3D ellipsoids, including a P3P-based initial pose and Wasserstein-distance refinement. The method demonstrates improved relocalization robustness and accuracy against viewpoint changes on indoor RGBD datasets, outperforming OA-SLAM and approaching ORB-SLAM2 performance when combined with point cues. The approach offers real-time performance, seamless integration with RGBD SLAM, and enhanced reliability for mobile robotics in unknown indoor environments.

Abstract

Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual relocalization is greatly enhanced especially in viewpoints where classical methods fail. However, ellipsoids (quadrics) generated by axis-aligned object detection may limit the accuracy of the object-level representation and degenerate the performance of visual relocalization system. In this paper, we propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. As for visual relocalization, we design a better pose optimization strategy for camera pose recovery, to fully utilize the projection characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of these modules are entirely intergrated into visual SLAM system. Experimental results show that our semantic object-level mapping and object-based visual relocalization methods significantly enhance the performance of visual relocalization in terms of robustness to new viewpoints.

Semantic Object-level Modeling for Robust Visual Camera Relocalization

TL;DR

This work tackles robust visual camera relocalization in SLAM by building semantic object-level voxel maps that yield accurate ellipsoidal representations of objects. It integrates object-level voxel modelling with an object-based relocalization strategy that exploits the projection properties of 2D ellipses and 3D ellipsoids, including a P3P-based initial pose and Wasserstein-distance refinement. The method demonstrates improved relocalization robustness and accuracy against viewpoint changes on indoor RGBD datasets, outperforming OA-SLAM and approaching ORB-SLAM2 performance when combined with point cues. The approach offers real-time performance, seamless integration with RGBD SLAM, and enhanced reliability for mobile robotics in unknown indoor environments.

Abstract

Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual relocalization is greatly enhanced especially in viewpoints where classical methods fail. However, ellipsoids (quadrics) generated by axis-aligned object detection may limit the accuracy of the object-level representation and degenerate the performance of visual relocalization system. In this paper, we propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. As for visual relocalization, we design a better pose optimization strategy for camera pose recovery, to fully utilize the projection characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of these modules are entirely intergrated into visual SLAM system. Experimental results show that our semantic object-level mapping and object-based visual relocalization methods significantly enhance the performance of visual relocalization in terms of robustness to new viewpoints.
Paper Structure (15 sections, 8 equations, 7 figures, 1 table)

This paper contains 15 sections, 8 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Relocalization using known ellipsoid-based and point-based map. (a): A RGB frame in our collected video sequence for mapping. (b): There are some viewpoint changes between the yellow keyframes for mapping and black ground truth trajectory of relocalization sequence. (c)&(d): Blue and red points are successfully relocalized frames using ORB-SLAM2ORB-SLAM2 and our method, repectively. Apparently, Our method allows the camera to be relocalized from viewpoints where ORB-SLAM2 fails.
  • Figure 2: System overview: dashed boxes are newly added elements within ORB-SLAM2 backbone.The modules filled with different colors are run in separate thread.
  • Figure 3: Overview of object-based Mapping pipeline. TV monitor, keyboard and book are three different sample objects. Object Tracking procedure is not shown in the figure. The associated 3D object is continuously updated and filtered through the 2D images, and the pose and ellipsoid model are updated simultaneously.
  • Figure 4: Accurate ellipsoids mapping in three different scenes.((I, II, III) are TUM fr3/long_office_household, TUM fr2/desk and customized scene, respectively.) From left to right, the columns are the RGB(s) of three scenes, mapping process, voxel and cuboids, ellipsoid model and ORB points on objects surface.
  • Figure 5: Mapping comparision: Our Method and OA-SLAMOA-SLAM on ellipsoidal landmarks mapping on TUM fr3/long_office_household show that our method has better accuracy in ellipsoidal representations.
  • ...and 2 more figures