Extreme Rotation Estimation in the Wild
Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbuch-Elor
TL;DR
This work tackles estimating the relative 3D rotation between non-overlapping real-world images by introducing a Transformer-based Rotation Estimation Transformer that leverages LoFTR features and auxiliary channels. It introduces the ExtremeLandmarkPairs dataset to benchmark extreme-view rotations in the wild and presents a progressive training pipeline that starts from panorama crops and extends to real Internet data with FoV and appearance augmentations. The approach achieves state-of-the-art performance on non-overlapping wild pairs while remaining competitive on panorama-cropped overlaps, highlighting the value of real-world data and multi-modal cues for robust pose estimation. The dataset and methodology have practical implications for camera localization and large-scale 3D reconstruction in unconstrained settings, with room for further improvements via enhanced augmentations and multi-view extensions.
Abstract
We present a technique and benchmark dataset for estimating the relative 3D orientation between a pair of Internet images captured in an extreme setting, where the images have limited or non-overlapping field of views. Prior work targeting extreme rotation estimation assume constrained 3D environments and emulate perspective images by cropping regions from panoramic views. However, real images captured in the wild are highly diverse, exhibiting variation in both appearance and camera intrinsics. In this work, we propose a Transformer-based method for estimating relative rotations in extreme real-world settings, and contribute the ExtremeLandmarkPairs dataset, assembled from scene-level Internet photo collections. Our evaluation demonstrates that our approach succeeds in estimating the relative rotations in a wide variety of extreme-view Internet image pairs, outperforming various baselines, including dedicated rotation estimation techniques and contemporary 3D reconstruction methods.
