Table of Contents
Fetching ...

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Yuxiang Ji, Boyong He, Zhuoyue Tan, Liaoni Wu

TL;DR

This work tackles UAV geo-localization in GPS-denied environments by introducing GTA-UAV, a large-scale game-based benchmark that enables partial cross-view matching between drone-view and satellite-view data across multiple altitudes and attitudes. It introduces weighted-InfoNCE with IOU-derived weights and a mutually exclusive sampling strategy to train models capable of aligning partially overlapping drone and satellite views, extending retrieval to distance-based localization. Experiments show that the proposed method achieves state-of-the-art performance on GTA-UAV and transfers effectively to real UAV data (UAV-VisLoc), with improved zero-shot and fine-tuned results and notable reductions in localization error. Overall, the dataset and partial-matching training paradigm bridge the gap between synthetic-contiguous-area scenarios and practical UAV geo-localization tasks, enabling more robust GPS-denied operation.

Abstract

The vision-based geo-localization technology for UAV, serving as a secondary source of GPS information in addition to the global navigation satellite systems (GNSS), can still operate independently in the GPS-denied environment. Recent deep learning based methods attribute this as the task of image matching and retrieval. By retrieving drone-view images in geo-tagged satellite image database, approximate localization information can be obtained. However, due to high costs and privacy concerns, it is usually difficult to obtain large quantities of drone-view images from a continuous area. Existing drone-view datasets are mostly composed of small-scale aerial photography with a strong assumption that there exists a perfect one-to-one aligned reference image for any query, leaving a significant gap from the practical localization scenario. In this work, we construct a large-range contiguous area UAV geo-localization dataset named GTA-UAV, featuring multiple flight altitudes, attitudes, scenes, and targets using modern computer games. Based on this dataset, we introduce a more practical UAV geo-localization task including partial matches of cross-view paired data, and expand the image-level retrieval to the actual localization in terms of distance (meters). For the construction of drone-view and satellite-view pairs, we adopt a weight-based contrastive learning approach, which allows for effective learning while avoiding additional post-processing matching steps. Experiments demonstrate the effectiveness of our data and training method for UAV geo-localization, as well as the generalization capabilities to real-world scenarios.

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

TL;DR

This work tackles UAV geo-localization in GPS-denied environments by introducing GTA-UAV, a large-scale game-based benchmark that enables partial cross-view matching between drone-view and satellite-view data across multiple altitudes and attitudes. It introduces weighted-InfoNCE with IOU-derived weights and a mutually exclusive sampling strategy to train models capable of aligning partially overlapping drone and satellite views, extending retrieval to distance-based localization. Experiments show that the proposed method achieves state-of-the-art performance on GTA-UAV and transfers effectively to real UAV data (UAV-VisLoc), with improved zero-shot and fine-tuned results and notable reductions in localization error. Overall, the dataset and partial-matching training paradigm bridge the gap between synthetic-contiguous-area scenarios and practical UAV geo-localization tasks, enabling more robust GPS-denied operation.

Abstract

The vision-based geo-localization technology for UAV, serving as a secondary source of GPS information in addition to the global navigation satellite systems (GNSS), can still operate independently in the GPS-denied environment. Recent deep learning based methods attribute this as the task of image matching and retrieval. By retrieving drone-view images in geo-tagged satellite image database, approximate localization information can be obtained. However, due to high costs and privacy concerns, it is usually difficult to obtain large quantities of drone-view images from a continuous area. Existing drone-view datasets are mostly composed of small-scale aerial photography with a strong assumption that there exists a perfect one-to-one aligned reference image for any query, leaving a significant gap from the practical localization scenario. In this work, we construct a large-range contiguous area UAV geo-localization dataset named GTA-UAV, featuring multiple flight altitudes, attitudes, scenes, and targets using modern computer games. Based on this dataset, we introduce a more practical UAV geo-localization task including partial matches of cross-view paired data, and expand the image-level retrieval to the actual localization in terms of distance (meters). For the construction of drone-view and satellite-view pairs, we adopt a weight-based contrastive learning approach, which allows for effective learning while avoiding additional post-processing matching steps. Experiments demonstrate the effectiveness of our data and training method for UAV geo-localization, as well as the generalization capabilities to real-world scenarios.
Paper Structure (21 sections, 2 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 2 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparision between perfect matching pair and partial matching pair.
  • Figure 2: The paired data construction process of GTA-UAV, where Positive and Semi-positive satellite-view are paired with Drone-view by IOU.
  • Figure 3: The overview of our training and inference pipeline. (left) We use ViT as feature encoder and weighted-InfoNCE for training positive and semi-positive batched samples from mutually exclusive sampling. (right) Then the retrieval could be based on discriminative features to achieve localization.
  • Figure 4: Meter-level localization accuracy of different methods on (left) cross-area and (right) same-area.