Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization
Tao Liu, Kan Ren, Qian Chen
TL;DR
This work tackles cross-view UAV localization in GNSS-denied environments by reframing matching as graph-based relational reasoning over object-detected regions. It combines a dual-graph representation (spatial and semantic) with a Graph Attention Network to learn UAV-to-satellite correspondences, optimized via multi-task losses including graph-node matching, embedding, and scene-classification. The approach demonstrates strong cross-view and cross-modal performance on public and infrared-visible datasets, with ablations confirming the value of semantic cues, global features, and dynamic loss weighting. The practical impact lies in robust, efficient localization across time, viewpoint, and modality gaps, with publicly available infrared datasets to support future research and evaluation.
Abstract
With the rapid growth of the low-altitude economy, UAVs have become crucial for measurement and tracking in patrol systems. However, in GNSS-denied areas, satellite-based localization methods are prone to failure. This paper presents a cross-view UAV localization framework that performs map matching via object detection, aimed at effectively addressing cross-temporal, cross-view, heterogeneous aerial image matching. In typical pipelines, UAV visual localization is formulated as an image-retrieval problem: features are extracted to build a localization map, and the pose of a query image is estimated by matching it to a reference database with known poses. Because publicly available UAV localization datasets are limited, many approaches recast localization as a classification task and rely on scene labels in these datasets to ensure accuracy. Other methods seek to reduce cross-domain differences using polar-coordinate reprojection, perspective transformations, or generative adversarial networks; however, they can suffer from misalignment, content loss, and limited realism. In contrast, we leverage modern object detection to accurately extract salient instances from UAV and satellite images, and integrate a graph neural network to reason about inter-image and intra-image node relationships. Using a fine-grained, graph-based node-similarity metric, our method achieves strong retrieval and localization performance. Extensive experiments on public and real-world datasets show that our approach handles heterogeneous appearance differences effectively and generalizes well, making it applicable to scenarios with larger modality gaps, such as infrared-visible image matching. Our dataset will be publicly available at the following URL: https://github.com/liutao23/ODGNNLoc.git.
