Table of Contents
Fetching ...

RHO: Robust Holistic OSM-Based Metric Cross-View Geo-Localization

Junwei Zheng, Ruize Dai, Ruiping Liu, Zichao Zeng, Yufan Chen, Fangjinhua Wang, Kunyu Peng, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

Abstract

Metric Cross-View Geo-Localization (MCVGL) aims to estimate the 3-DoF camera pose (position and heading) by matching ground and satellite images. In this work, instead of pinhole and satellite images, we study robust MCVGL using holistic panoramas and OpenStreetMap (OSM). To this end, we establish a large-scale MCVGL benchmark dataset, CV-RHO, with over 2.7M images under different weather and lighting conditions, as well as sensor noise. Furthermore, we propose a model termed RHO with a two-branch Pin-Pan architecture for accurate visual localization. A Split-Undistort-Merge (SUM) module is introduced to address the panoramic distortion, and a Position-Orientation Fusion (POF) mechanism is designed to enhance the localization accuracy. Extensive experiments prove the value of our CV-RHO dataset and the effectiveness of the RHO model, with a significant performance gain up to 20% compared with the state-of-the-art baselines. Project page: https://github.com/InSAI-Lab/RHO.

RHO: Robust Holistic OSM-Based Metric Cross-View Geo-Localization

Abstract

Metric Cross-View Geo-Localization (MCVGL) aims to estimate the 3-DoF camera pose (position and heading) by matching ground and satellite images. In this work, instead of pinhole and satellite images, we study robust MCVGL using holistic panoramas and OpenStreetMap (OSM). To this end, we establish a large-scale MCVGL benchmark dataset, CV-RHO, with over 2.7M images under different weather and lighting conditions, as well as sensor noise. Furthermore, we propose a model termed RHO with a two-branch Pin-Pan architecture for accurate visual localization. A Split-Undistort-Merge (SUM) module is introduced to address the panoramic distortion, and a Position-Orientation Fusion (POF) mechanism is designed to enhance the localization accuracy. Extensive experiments prove the value of our CV-RHO dataset and the effectiveness of the RHO model, with a significant performance gain up to 20% compared with the state-of-the-art baselines. Project page: https://github.com/InSAI-Lab/RHO.

Paper Structure

This paper contains 16 sections, 8 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: OrienterNet sarlin2023orienternet fails under adverse conditions. PR and OR stand for Position Recall and Orientation Recall.
  • Figure 2: Data samples of our CV-RHO dataset.
  • Figure 3: Overview of the RHO model in a two-branch Pin-Pan architecture. Given the current panorama as input, the panoramic branch with Split-Undistort-Merge (SUM) module produces the 360°-FoV feature, while the pinhole branch focuses on local dense feature in 120° FoV. Both generated features are matched with the OSM feature map and then fused by the proposed Position-Orientation Fusion.
  • Figure 4: Position-Orientation Fusion module of the RHO model. Given both 360°-FoV and 120°-FoV volumes, POF combines the pinhole volume $\mathbf{S}_{1}$ with $Prior_{uv}$, and then the panoramic volume $\mathbf{S}_{pano}$ with $Prior_{\theta}$.
  • Figure 5: More samples of generated images. Images with red borders are original images.
  • ...and 2 more figures