HR-APR: APR-agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera Relocalisation
Changkun Liu, Shuai Chen, Yukun Zhao, Huajian Huang, Victor Prisacariu, Tristan Braud
TL;DR
This work addresses the unreliability of absolute pose regressors (APRs) by introducing HR-APR, an APR-agnostic framework that estimates pose reliability through a lightweight pose-based retrieval of training embeddings and uses this uncertainty to guide a NeFeS-based refinement. The uncertainty module is modular and can be plugged into diverse APR architectures, achieving substantial reduction in refinement overhead (up to 27.4% indoors and 15.2% outdoors) while preserving state-of-the-art single-image pose accuracy. Extensive experiments on indoor 7Scenes and outdoor Cambridge Landmarks demonstrate a consistent correlation between uncertainty and pose error, enabling selective refinement and improved robustness without architecture-specific modifications. The approach is efficient, storage-friendly, and suitable for real-time camera relocalisation, highlighting its practical impact for robotics and AR applications.
Abstract
Absolute Pose Regressors (APRs) directly estimate camera poses from monocular images, but their accuracy is unstable for different queries. Uncertainty-aware APRs provide uncertainty information on the estimated pose, alleviating the impact of these unreliable predictions. However, existing uncertainty modelling techniques are often coupled with a specific APR architecture, resulting in suboptimal performance compared to state-of-the-art (SOTA) APR methods. This work introduces a novel APR-agnostic framework, HR-APR, that formulates uncertainty estimation as cosine similarity estimation between the query and database features. It does not rely on or affect APR network architecture, which is flexible and computationally efficient. In addition, we take advantage of the uncertainty for pose refinement to enhance the performance of APR. The extensive experiments demonstrate the effectiveness of our framework, reducing 27.4\% and 15.2\% of computational overhead on the 7Scenes and Cambridge Landmarks datasets while maintaining the SOTA accuracy in single-image APRs.
