Table of Contents
Fetching ...

Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings

Panwang Xia, Lei Yu, Yi Wan, Qiong Wu, Peiqi Chen, Liheng Zhong, Yongxiang Yao, Dong Wei, Xinyi Liu, Lixiang Ru, Yingying Zhang, Jiangwei Lao, Jingdong Chen, Ming Yang, Yongjun Zhang

TL;DR

This work introduces decentrality as a practical challenge in cross-view geo-localization between street-view and VHR satellite imagery, and presents DReSS, a large-scale dataset designed to stress-test CVGL under large decentrality across eight cities. To address this challenge, the authors propose AuxGeo, a multi-metric framework that adds a BEV-based intermediary (BIM) and a Position Constraint Module (PCM) during training, achieving improved cross-view matching without extra inference cost. The method delivers state-of-the-art results on public datasets (CVUSA, CVACT, VIGOR) and the new DReSS benchmark, particularly under increasing decentrality, and demonstrates robust cross-dataset generalization and favorable visualization of learned correspondences. These contributions advance practical CVGL for GNSS-denied environments relevant to disaster response and urban navigation, while also highlighting remaining gaps at very high decentrality.

Abstract

Cross-View Geo-Localization tackles the challenge of image geo-localization in GNSS-denied environments, including disaster response scenarios, urban canyons, and dense forests, by matching street-view query images with geo-tagged aerial-view reference images. However, current research often relies on benchmarks and methods that assume center-aligned settings or account for only limited decentrality, which we define as the offset of the query image relative to the reference image center. Such assumptions fail to reflect real-world scenarios, where reference databases are typically pre-established without the possibility of ensuring perfect alignment for each query image. Moreover, decentrality is a critical factor warranting deeper investigation, as larger decentrality can substantially improve localization efficiency but comes at the cost of declines in localization accuracy. To address this limitation, we introduce DReSS (Decentrality Related Street-view and Satellite-view dataset), a novel dataset designed to evaluate cross-view geo-localization with a large geographic scope and diverse landscapes, emphasizing the decentrality issue. Meanwhile, we propose AuxGeo (Auxiliary Enhanced Geo-Localization) to further study the decentrality issue, which leverages a multi-metric optimization strategy with two novel modules: the Bird's-eye view Intermediary Module (BIM) and the Position Constraint Module (PCM). These modules improve the localization accuracy despite the decentrality problem. Extensive experiments demonstrate that AuxGeo outperforms previous methods on our proposed DReSS dataset, mitigating the issue of large decentrality, and also achieves state-of-the-art performance on existing public datasets such as CVUSA, CVACT, and VIGOR.

Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings

TL;DR

This work introduces decentrality as a practical challenge in cross-view geo-localization between street-view and VHR satellite imagery, and presents DReSS, a large-scale dataset designed to stress-test CVGL under large decentrality across eight cities. To address this challenge, the authors propose AuxGeo, a multi-metric framework that adds a BEV-based intermediary (BIM) and a Position Constraint Module (PCM) during training, achieving improved cross-view matching without extra inference cost. The method delivers state-of-the-art results on public datasets (CVUSA, CVACT, VIGOR) and the new DReSS benchmark, particularly under increasing decentrality, and demonstrates robust cross-dataset generalization and favorable visualization of learned correspondences. These contributions advance practical CVGL for GNSS-denied environments relevant to disaster response and urban navigation, while also highlighting remaining gaps at very high decentrality.

Abstract

Cross-View Geo-Localization tackles the challenge of image geo-localization in GNSS-denied environments, including disaster response scenarios, urban canyons, and dense forests, by matching street-view query images with geo-tagged aerial-view reference images. However, current research often relies on benchmarks and methods that assume center-aligned settings or account for only limited decentrality, which we define as the offset of the query image relative to the reference image center. Such assumptions fail to reflect real-world scenarios, where reference databases are typically pre-established without the possibility of ensuring perfect alignment for each query image. Moreover, decentrality is a critical factor warranting deeper investigation, as larger decentrality can substantially improve localization efficiency but comes at the cost of declines in localization accuracy. To address this limitation, we introduce DReSS (Decentrality Related Street-view and Satellite-view dataset), a novel dataset designed to evaluate cross-view geo-localization with a large geographic scope and diverse landscapes, emphasizing the decentrality issue. Meanwhile, we propose AuxGeo (Auxiliary Enhanced Geo-Localization) to further study the decentrality issue, which leverages a multi-metric optimization strategy with two novel modules: the Bird's-eye view Intermediary Module (BIM) and the Position Constraint Module (PCM). These modules improve the localization accuracy despite the decentrality problem. Extensive experiments demonstrate that AuxGeo outperforms previous methods on our proposed DReSS dataset, mitigating the issue of large decentrality, and also achieves state-of-the-art performance on existing public datasets such as CVUSA, CVACT, and VIGOR.

Paper Structure

This paper contains 34 sections, 4 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Visualization of the decentrality issue. Red circles simulate the visible regions of street-view panoramas in VHR satellite reference images. Higher decentrality reduces global similarity, increasing the difficulty of establishing cross-view image correspondence.
  • Figure 2: (a) Comparison of the hit area of VIGOR (yellow box) and DReSS (red box). Four subsets are divided within the hit area of DReSS with rising decentrality. (b) Comparison of the coverage scope between VIGOR and DReSS.
  • Figure 3: Visualization of decentrality conditions across different datasets. The red star in CVUSA, CVACT and CVGlobal represents center alignment, while the yellow box in VIGOR and the red box in DReSS represent the hit areas, indicating different degrees of decentrality.
  • Figure 4: Aerial images of eight cities with diverse landscapes from across the world and the distributions of panoramas (red dots) in the DReSS dataset.
  • Figure 5: (A1) Overview of our proposed method AuxGeo with two novel modules BIM and PCM. (B) Illustration of the inference phase of the AuxGeo, which demonstrates that the proposed modules act as components of the multi-metric optimization and take no extra cost during inference. (A2) Illustration of the proposed PCM module.
  • ...and 4 more figures