Table of Contents
Fetching ...

Cross-view geo-localization: a survey

Abhilash Durgam, Sidike Paheding, Vikas Dhiman, Vijay Devabhaktuni

TL;DR

Cross-view geo-localization aims to determine a scene's geographic location by matching ground-view imagery with overhead satellite or aerial views. The survey traces the field from pixel-wise geodetic alignment through feature-based methods to modern deep-learning approaches, highlighting orientation-aware encodings, capsule networks, and transformer-based architectures. It reviews key datasets such as CVUSA, CVACT, and VIGOR, and analyzes progress from early two-branch networks using contrastive losses to state-of-the-art attention and transformer frameworks like TransGeo and MGTL that push recall metrics toward the 90s on standard benchmarks. The work underscores practical implications for automotive, robotics, AR, and UAV localization, and identifies future directions in transformer-based modeling, deformable attention, domain adaptation, and multi-task learning to address persistent cross-view challenges.

Abstract

Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape.

Cross-view geo-localization: a survey

TL;DR

Cross-view geo-localization aims to determine a scene's geographic location by matching ground-view imagery with overhead satellite or aerial views. The survey traces the field from pixel-wise geodetic alignment through feature-based methods to modern deep-learning approaches, highlighting orientation-aware encodings, capsule networks, and transformer-based architectures. It reviews key datasets such as CVUSA, CVACT, and VIGOR, and analyzes progress from early two-branch networks using contrastive losses to state-of-the-art attention and transformer frameworks like TransGeo and MGTL that push recall metrics toward the 90s on standard benchmarks. The work underscores practical implications for automotive, robotics, AR, and UAV localization, and identifies future directions in transformer-based modeling, deformable attention, domain adaptation, and multi-task learning to address persistent cross-view challenges.

Abstract

Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape.
Paper Structure (29 sections, 7 equations, 12 figures, 5 tables)

This paper contains 29 sections, 7 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: An illustrative example of the geo-ocalization problem.
  • Figure 2: A Timeline geo-ocalization problem.
  • Figure 3: An illustrative example of training a pair of Neural Networks, similar to what was proposed in lin2015learning.
  • Figure 4: Encoding orientation using color maps u,v for altitude and azimuth as proposed by Liu_2019_CVPR.
  • Figure 5: The flowchart of geo-localization method proposed by DBLP:journals/corr/abs-2005-03860. Initially, an aerial image undergoes a polar transformation. Following this, a dual-stream CNN is employed to derive features from both ground-level and polar-transformed aerial images. The features extracted serve to compute the correlation between the two viewpoints, facilitating the estimation of the ground image orientation in relation to its aerial counterpart. Subsequent to this process, the features originating from the aerial perspective are adjusted and trimmed to align with the potential area corresponding with ground view features. The similarities identified amongst the adjusted features are then leveraged for the purpose of location retrieval.
  • ...and 7 more figures