Table of Contents
Fetching ...

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung

TL;DR

360Loc is the first dataset and benchmark for cross-device omnidirectional visual localization, providing ground-truth 6DoF poses for 360° reference imagery and cross-device queries from pinhole, fisheye, and 360° cameras. It introduces a practical 360° mapping pipeline that fuses LiDAR with 360° imagery to generate accurate ground truth, and a virtual camera framework to generate lower-FoV query views from 360° data for fair cross-device comparisons. The benchmark thoroughly evaluates both feature-matching-based methods and absolute pose regressors, showing that omnidirectional data improves localization in challenging, symmetric environments and that virtual-camera augmentation reduces cross-device domain gaps, enhancing generalization. The results offer new insights into 360° mapping, cross-device localization, and the role of FoV in retrieval, matching, and regression-based localization, with practical impact for robotics, AR, and large-scale environment modeling.

Abstract

Portable 360$^\circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360$^\circ$ images with ground truth poses for visual localization. We present a practical implementation of 360$^\circ$ mapping combining 360$^\circ$ images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360$^\circ$ reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360$^\circ$ cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360$^\circ$ images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

TL;DR

360Loc is the first dataset and benchmark for cross-device omnidirectional visual localization, providing ground-truth 6DoF poses for 360° reference imagery and cross-device queries from pinhole, fisheye, and 360° cameras. It introduces a practical 360° mapping pipeline that fuses LiDAR with 360° imagery to generate accurate ground truth, and a virtual camera framework to generate lower-FoV query views from 360° data for fair cross-device comparisons. The benchmark thoroughly evaluates both feature-matching-based methods and absolute pose regressors, showing that omnidirectional data improves localization in challenging, symmetric environments and that virtual-camera augmentation reduces cross-device domain gaps, enhancing generalization. The results offer new insights into 360° mapping, cross-device localization, and the role of FoV in retrieval, matching, and regression-based localization, with practical impact for robotics, AR, and large-scale environment modeling.

Abstract

Portable 360 cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360 images with ground truth poses for visual localization. We present a practical implementation of 360 mapping combining 360 images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360 reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360 cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360 images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.
Paper Structure (25 sections, 10 equations, 10 figures, 17 tables)

This paper contains 25 sections, 10 equations, 10 figures, 17 tables.

Figures (10)

  • Figure 1: Overview of dataset collection and ground truth generation: 1) Use the platform to collect 360$^\circ$ images and frame-by-frame point clouds. Obtain real-time camera poses; 2) Apply optimization methodology to achieve data registration, resulting in a globally reconstructed point cloud model. Then, align the models in daytime and nighttime to get consistent poses; 3) Perform cropping to get virtual camera images and generate corresponding depth images. As a result, 360Loc takes advantage of 360$^\circ$ images for efficient mapping while providing query images in five different camera models in order to analyze the challenge of cross-domain visual localization.
  • Figure 2: The four scenes in 360Loc, all four scenes contain symmetrical, repetitive structures and moving objects. The camera trajectories are visualized as spheres.
  • Figure 3: Illustration of obtaining virtual camera images through random poses and image cropping.
  • Figure 4: Overview of GT generation.
  • Figure 5: The average of median translation/rotation errors in ($m/^\circ$) over 4 scenes.
  • ...and 5 more figures