Table of Contents
Fetching ...

KS-APR: Keyframe Selection for Robust Absolute Pose Regression

Changkun Liu, Yukun Zhao, Tristan Braud

TL;DR

This work tackles the vulnerability of Absolute Pose Regression (APR) in markerless mobile AR when input images deviate from training data. It introduces KS-APR, a lightweight, APR-agnostic pipeline that performs pose-based image retrieval to identify training-set keyframes and validates pose estimates through local feature matching, discarding unreliable frames. Across indoor 7Scenes and outdoor Cambridge datasets, KS-APR yields substantial reductions in median position and orientation errors while maintaining a high fraction of useful keyframes, with an overhead of only about 15 ms. The method enables state-of-the-art APRs to beat both single-image and sequential APR baselines, and it integrates smoothly with existing AR pipelines and VIO systems for robust, real-time visual localization in changing environments.

Abstract

Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects. Absolute Pose Regressors (APR) are end-to-end machine learning solutions that infer the device's pose from a single monocular image. Thanks to their low computation cost, they can be directly executed on the constrained hardware of mobile AR devices. However, APR methods tend to yield significant inaccuracies for input images that are too distant from the training set. This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead by combining the inference results of the APR and the prior images in the training set. Mobile AR systems tend to rely upon visual-inertial odometry to track the relative pose of the device during the experience. As such, KS-APR favours reliability over frequency, discarding unreliable poses. This pipeline can integrate most existing APR methods to improve accuracy by filtering unreliable images with their pose estimates. We implement the pipeline on three types of APR models on indoor and outdoor datasets. The median error on position and orientation is reduced for all models, and the proportion of large errors is minimized across datasets. Our method enables state-of-the-art APRs such as DFNetdm to outperform single-image and sequential APR methods. These results demonstrate the scalability and effectiveness of KS-APR for visual localization tasks that do not require one-shot decisions.

KS-APR: Keyframe Selection for Robust Absolute Pose Regression

TL;DR

This work tackles the vulnerability of Absolute Pose Regression (APR) in markerless mobile AR when input images deviate from training data. It introduces KS-APR, a lightweight, APR-agnostic pipeline that performs pose-based image retrieval to identify training-set keyframes and validates pose estimates through local feature matching, discarding unreliable frames. Across indoor 7Scenes and outdoor Cambridge datasets, KS-APR yields substantial reductions in median position and orientation errors while maintaining a high fraction of useful keyframes, with an overhead of only about 15 ms. The method enables state-of-the-art APRs to beat both single-image and sequential APR baselines, and it integrates smoothly with existing AR pipelines and VIO systems for robust, real-time visual localization in changing environments.

Abstract

Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects. Absolute Pose Regressors (APR) are end-to-end machine learning solutions that infer the device's pose from a single monocular image. Thanks to their low computation cost, they can be directly executed on the constrained hardware of mobile AR devices. However, APR methods tend to yield significant inaccuracies for input images that are too distant from the training set. This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead by combining the inference results of the APR and the prior images in the training set. Mobile AR systems tend to rely upon visual-inertial odometry to track the relative pose of the device during the experience. As such, KS-APR favours reliability over frequency, discarding unreliable poses. This pipeline can integrate most existing APR methods to improve accuracy by filtering unreliable images with their pose estimates. We implement the pipeline on three types of APR models on indoor and outdoor datasets. The median error on position and orientation is reduced for all models, and the proportion of large errors is minimized across datasets. Our method enables state-of-the-art APRs such as DFNetdm to outperform single-image and sequential APR methods. These results demonstrate the scalability and effectiveness of KS-APR for visual localization tasks that do not require one-shot decisions.
Paper Structure (20 sections, 11 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Keyframe selection pipeline for APR with feature matching filter. The database stores the training set images and their corresponding ground truth poses. The green and red flows represent independent branches, depending on whether an image is a keyframe. Query image $I_{i}$ is a keyframe but query image $I_{i+1}$ is not.
  • Figure 2: COLMAP schonberger2016structure 3D reconstruction of the images in KingsCollege from Cambridge dataset kendall2015posenet. Two images presenting many feature matches may not be close in spatial location (e,g, blue and purple box), leading to high inaccuracy in most APR.
  • Figure 3: The framework of mobile markerless AR with KS-APR. The grey circles are keyframes. White circles are not keyframes and KS-APR filters out their predictions. We can use the tracking module in ARKit or ARCore to bridge keyframes and non-keyframes.
  • Figure 4: Camera ground truth trajectory. Green: training set; Blue: keyframes in testset; Red: filtered images in testset. Our proposed method primarily identifies frames close to the training set as keyframes for improving accuracy.
  • Figure 5: Examples of keyframes and removed images in Cambridge and 7Scenes. For each pair, left side is the query image in test set, right side is the image retrieved by Algorithm \ref{['alg:retrieval']} using pretrained APR. The upper row of each subfigure shows a query image in test set selected as a keyframe, and the lower row is a query image featuring too little matches to be considered a keyframe.