360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung
TL;DR
360Loc is the first dataset and benchmark for cross-device omnidirectional visual localization, providing ground-truth 6DoF poses for 360° reference imagery and cross-device queries from pinhole, fisheye, and 360° cameras. It introduces a practical 360° mapping pipeline that fuses LiDAR with 360° imagery to generate accurate ground truth, and a virtual camera framework to generate lower-FoV query views from 360° data for fair cross-device comparisons. The benchmark thoroughly evaluates both feature-matching-based methods and absolute pose regressors, showing that omnidirectional data improves localization in challenging, symmetric environments and that virtual-camera augmentation reduces cross-device domain gaps, enhancing generalization. The results offer new insights into 360° mapping, cross-device localization, and the role of FoV in retrieval, matching, and regression-based localization, with practical impact for robotics, AR, and large-scale environment modeling.
Abstract
Portable 360$^\circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360$^\circ$ images with ground truth poses for visual localization. We present a practical implementation of 360$^\circ$ mapping combining 360$^\circ$ images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360$^\circ$ reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360$^\circ$ cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360$^\circ$ images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.
