HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset
Ron Ferens, Yosi Keller
TL;DR
HyperPose tackles domain gaps in absolute camera pose regression by embedding a hypernetwork that generates input-conditioned weights for the pose regression heads, enabling adaptive feature emphasis during inference. The approach extends to both single- and multi-scene APRs, with MS-HyperPose employing DETR-inspired transformers to process activation maps while the hypernetwork supplies regression-head weights conditioned on the input. A new Extended Cambridge Landmarks (ECL) dataset benchmarks robustness to seasonal and lighting variations, and experiments show HyperPose improves over state-of-the-art APRs on Cambridge and 7Scenes, while MS-HyperPose achieves top multi-scene results and competitive latency (≈$30.93$ ms) and model size (≈$571$ MB). The contributions include a general hypernetwork-enabled APR framework, quantitative gains across diverse datasets, and the ECL benchmark to drive development of more invariant localization methods, with open-source code and models provided. The pose is represented as $p = <\mathbf{x}, \mathbf{q}>$, where $\mathbf{x} \in \mathbb{R}^3$ and $\mathbf{q} \in \mathbb{R}^4$ encode position and orientation, respectively.
Abstract
In this work, we propose HyperPose, which utilizes hyper-networks in absolute camera pose regressors. The inherent appearance variations in natural scenes, attributable to environmental conditions, perspective, and lighting, induce a significant domain disparity between the training and test datasets. This disparity degrades the precision of contemporary localization networks. To mitigate this, we advocate for incorporating hypernetworks into single-scene and multiscene camera pose regression models. During inference, the hypernetwork dynamically computes adaptive weights for the localization regression heads based on the particular input image, effectively narrowing the domain gap. Using indoor and outdoor datasets, we evaluate the HyperPose methodology across multiple established absolute pose regression architectures. We also introduce and share the Extended Cambridge Landmarks (ECL), a novel localization dataset, based on the Cambridge Landmarks dataset, showing it in multiple seasons with significantly varying appearance conditions. Our empirical experiments demonstrate that HyperPose yields notable performance enhancements for single- and multi-scene architectures. We have made our source code, pre-trained models, and the ECL dataset openly available.
