Cross-view geo-localization: a survey
Abhilash Durgam, Sidike Paheding, Vikas Dhiman, Vijay Devabhaktuni
TL;DR
Cross-view geo-localization aims to determine a scene's geographic location by matching ground-view imagery with overhead satellite or aerial views. The survey traces the field from pixel-wise geodetic alignment through feature-based methods to modern deep-learning approaches, highlighting orientation-aware encodings, capsule networks, and transformer-based architectures. It reviews key datasets such as CVUSA, CVACT, and VIGOR, and analyzes progress from early two-branch networks using contrastive losses to state-of-the-art attention and transformer frameworks like TransGeo and MGTL that push recall metrics toward the 90s on standard benchmarks. The work underscores practical implications for automotive, robotics, AR, and UAV localization, and identifies future directions in transformer-based modeling, deformable attention, domain adaptation, and multi-task learning to address persistent cross-view challenges.
Abstract
Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape.
