Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen, Yash Bhalgat, Xinghui Li, Jiawang Bian, Kejie Li, Zirui Wang, Victor Adrian Prisacariu
TL;DR
This work addresses the gap in Absolute Pose Regression (APR) where pose predictions rely on 2D inference without strong geometry priors. It introduces a test-time refinement framework that leverages an implicit 3D feature field via a Neural Feature Synthesizer (NeFeS) to render dense novel-view features and optimize a feature-metric loss $L_{feature}$ to refine the pose. A progressive training strategy and a Feature Fusion module enhance the robustness of the rendered features, enabling end-to-end backpropagation that improves APR without extra unlabeled data. Across Cambridge Landmarks and 7-Scenes, the method delivers state-of-the-art single-image APR accuracy across multiple backbones, illustrating a practical middle ground between APR and full geometry-based localization with favorable efficiency.
Abstract
Absolute Pose Regression (APR) methods use deep neural networks to directly regress camera poses from RGB images. However, the predominant APR architectures only rely on 2D operations during inference, resulting in limited accuracy of pose estimation due to the lack of 3D geometry constraints or priors. In this work, we propose a test-time refinement pipeline that leverages implicit geometric constraints using a robust feature field to enhance the ability of APR methods to use 3D information during inference. We also introduce a novel Neural Feature Synthesizer (NeFeS) model, which encodes 3D geometric features during training and directly renders dense novel view features at test time to refine APR methods. To enhance the robustness of our model, we introduce a feature fusion module and a progressive training strategy. Our proposed method achieves state-of-the-art single-image APR accuracy on indoor and outdoor datasets.
