Table of Contents
Fetching ...

Augmented Reality without Borders: Achieving Precise Localization Without Maps

Albert Gassol Puigjaner, Irvin Aloise, Patrik Schmuck

TL;DR

This work introduces MARLoc, a novel localization framework for AR applications that uses known relative transformations within image sequences to perform intra-sequence triangulation, generating 3D-2D correspondences for pose estimation and refinement and eliminates the need for pre-built SfM maps.

Abstract

Visual localization is crucial for Computer Vision and Augmented Reality (AR) applications, where determining the camera or device's position and orientation is essential to accurately interact with the physical environment. Traditional methods rely on detailed 3D maps constructed using Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM), which is computationally expensive and impractical for dynamic or large-scale environments. We introduce MARLoc, a novel localization framework for AR applications that uses known relative transformations within image sequences to perform intra-sequence triangulation, generating 3D-2D correspondences for pose estimation and refinement. MARLoc eliminates the need for pre-built SfM maps, providing accurate and efficient localization suitable for dynamic outdoor environments. Evaluation with benchmark datasets and real-world experiments demonstrates MARLoc's state-of-the-art performance and robustness. By integrating MARLoc into an AR device, we highlight its capability to achieve precise localization in real-world outdoor scenarios, showcasing its practical effectiveness and potential to enhance visual localization in AR applications.

Augmented Reality without Borders: Achieving Precise Localization Without Maps

TL;DR

This work introduces MARLoc, a novel localization framework for AR applications that uses known relative transformations within image sequences to perform intra-sequence triangulation, generating 3D-2D correspondences for pose estimation and refinement and eliminates the need for pre-built SfM maps.

Abstract

Visual localization is crucial for Computer Vision and Augmented Reality (AR) applications, where determining the camera or device's position and orientation is essential to accurately interact with the physical environment. Traditional methods rely on detailed 3D maps constructed using Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM), which is computationally expensive and impractical for dynamic or large-scale environments. We introduce MARLoc, a novel localization framework for AR applications that uses known relative transformations within image sequences to perform intra-sequence triangulation, generating 3D-2D correspondences for pose estimation and refinement. MARLoc eliminates the need for pre-built SfM maps, providing accurate and efficient localization suitable for dynamic outdoor environments. Evaluation with benchmark datasets and real-world experiments demonstrates MARLoc's state-of-the-art performance and robustness. By integrating MARLoc into an AR device, we highlight its capability to achieve precise localization in real-world outdoor scenarios, showcasing its practical effectiveness and potential to enhance visual localization in AR applications.
Paper Structure (12 sections, 2 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 2 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Given a sequence of locally posed query frames$\mathcal{T}_q$ and a set of posed reference frames$\mathcal{T}_r$, our objective is to find $\{{}_{r}\mathbf{T}_{{1}}^{Q{}}, {}_{r}\mathbf{T}_{{2}}^{Q{}}, \cdots, {}_{r}\mathbf{T}_{{N}}^{Q{}}\}$, i.e. the localized query frames poses with respect to the reference frame. Our approach does not need any prior geometry relative to the reference images (e.g. triangulated feature points), but it only relies on posed reference images to achieve this task.
  • Figure 2: The query frames are localized using only reference images from the database. MARLoc uses NetVLAD Arandjelovi2015NetVLAD + APGeM Revaud2019ApGem for image retrieval, while SuperPoint detone18superpoint and SuperGlue sarlin20superglue are used to extract and match local descriptors. We triangulate query 3D points by leveraging the prior poses of the query frames, allowing us to perform pose estimation via the algorithm. The final set of query poses is refined via .
  • Figure 3: Median rotation error against median translation error for each scene of the Niantic Map-Free dataset. The error distribution of our approach is concentrated in the bottom-left corner, with the center of mass in a low-error range.
  • Figure 4: Recall in Niantic Map-Free dataset at different APE thresholds. Best viewed when zoomed in.
  • Figure 5: Recall in LaMAR benchmark at different APE thresholds. Best viewed when zoomed in.
  • ...and 1 more figures