KS-APR: Keyframe Selection for Robust Absolute Pose Regression
Changkun Liu, Yukun Zhao, Tristan Braud
TL;DR
This work tackles the vulnerability of Absolute Pose Regression (APR) in markerless mobile AR when input images deviate from training data. It introduces KS-APR, a lightweight, APR-agnostic pipeline that performs pose-based image retrieval to identify training-set keyframes and validates pose estimates through local feature matching, discarding unreliable frames. Across indoor 7Scenes and outdoor Cambridge datasets, KS-APR yields substantial reductions in median position and orientation errors while maintaining a high fraction of useful keyframes, with an overhead of only about 15 ms. The method enables state-of-the-art APRs to beat both single-image and sequential APR baselines, and it integrates smoothly with existing AR pipelines and VIO systems for robust, real-time visual localization in changing environments.
Abstract
Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects. Absolute Pose Regressors (APR) are end-to-end machine learning solutions that infer the device's pose from a single monocular image. Thanks to their low computation cost, they can be directly executed on the constrained hardware of mobile AR devices. However, APR methods tend to yield significant inaccuracies for input images that are too distant from the training set. This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead by combining the inference results of the APR and the prior images in the training set. Mobile AR systems tend to rely upon visual-inertial odometry to track the relative pose of the device during the experience. As such, KS-APR favours reliability over frequency, discarding unreliable poses. This pipeline can integrate most existing APR methods to improve accuracy by filtering unreliable images with their pose estimates. We implement the pipeline on three types of APR models on indoor and outdoor datasets. The median error on position and orientation is reduced for all models, and the proportion of large errors is minimized across datasets. Our method enables state-of-the-art APRs such as DFNetdm to outperform single-image and sequential APR methods. These results demonstrate the scalability and effectiveness of KS-APR for visual localization tasks that do not require one-shot decisions.
