LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

Jianhao Jiao; Jinhao He; Changkun Liu; Sebastian Aegidius; Xiangcheng Hu; Tristan Braud; Dimitrios Kanoulas

LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

Jianhao Jiao, Jinhao He, Changkun Liu, Sebastian Aegidius, Xiangcheng Hu, Tristan Braud, Dimitrios Kanoulas

TL;DR

Lite VLoc is a hierarchical vi-sual localization framework that uses a lightweight topo-metric map to represent the environment and uses a learning-based feature matcher to establish dense correspondences between sparse keyframes and observations, and then refines poses with a geometric solver, enabling robustness to viewpoint changes.

Abstract

This paper presents LiteVLoc, a hierarchical visual localization framework that uses a lightweight topo-metric map to represent the environment. The method consists of three sequential modules that estimate camera poses in a coarse-to-fine manner. Unlike mainstream approaches relying on detailed 3D representations, LiteVLoc reduces storage overhead by leveraging learning-based feature matching and geometric solvers for metric pose estimation. A novel dataset for the map-free relocalization task is also introduced. Extensive experiments including localization and navigation in both simulated and real-world scenarios have validate the system's performance and demonstrated its precision and efficiency for large-scale deployment. Code and data will be made publicly available.

LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

TL;DR

Abstract

Paper Structure (30 sections, 2 equations, 4 figures, 4 tables)

This paper contains 30 sections, 2 equations, 4 figures, 4 tables.

Introduction
Motivation
Contributions
Related Work
VLoc without 3D Representation
VLoc with 3D Representation
Methodology
Preliminary
Problem Definition
LiteVLoc Framework
Map Construction
Map Representation
Keyframe Selection
Coarse-to-Fine Visual Localization
Global Localization
...and 15 more sections

Figures (4)

Figure 1: Pipeline of LiteVLoc. The mapping phase builds a topo-metric map from a set of images with poses (Section \ref{['sec:mapping']}). Selected cameras are marked in red; otherwise, they are marked in blue. The map includes two levels, with edges marked in black: level-$1$ for planning and level-$2$ for VLoc. The localization phase estimates camera's poses in a coarse-to-fine manner (Section \ref{['sec:visual_localization']}). The map and real-time localization results can be applied to various applications such as image goal navigation (Section \ref{['sec:navigation']}). The image to illustrate PnP is derived from sheffer2020pnp.
Figure 2: Sample images and the distribution of camera poses in the proposed dataset for evaluating map-free relocalization methods.
Figure 3: Real-world experiment with a legged robot, with the red curve showning the robot's trajectory estimated by our VLoc method. The nodes (visualized as axes: red-X, green-Y, and blue-Z) and connected edges indicate the topo-metric map's structure. The robot is guided by a sequence of goal images that were captured by the AR glasses (right). It starts inside a room, navigates outside, follows a circular route, and returns. It totally traverses $94.5m$ in $314s$ with an average speed of $0.3m/s$. Green boxes A-E highlight key planning events along the route: (A) a narrow pathway, (B) descending slope, (C and D) outdoor pathways, and (E) the lab region. Refer to the video for a full demonstration of the navigation process.
Figure 4: Estimated trajectories on a real-world sequence traversing a bridge. VLoc's trajectory closely aligns with the ground truth, demonstrating high accuracy. The sudden changes in both RobotOdom and RobotOdom-PS are due to corrections from VLoc's results.

LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

TL;DR

Abstract

LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)