Table of Contents
Fetching ...

Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization

Yuanze Xu, Ming Dai, Wenxiao Cai, Wankou Yang

TL;DR

CEUSP addresses GPS-denied UAV self-positioning by reframing it as cross-view geo-localization and contributions a multi-component framework: Rubik's Cube Attention (RCA) for multi-dimensional feature interaction, Context-Aware Channel Integration (CACI) for cross-dimensional attention, and a Dynamic Sampling Strategy (DSS) to curate challenging negatives. Built on a ConvNeXt-T backbone, CEUSP jointly optimizes representation, metric, and mutual learning losses, $\mathcal{L}_{rpt}$, $\mathcal{L}_{mtc}$, and $\mathcal{L}_{kl}$, achieving state-of-the-art results on DenseUAV ($R@1=89.45\%$, $AP=79.62\%$) and competitive performance on University-1652. The method demonstrates robust urban localization under dense sampling and spatial perturbations, thanks to RCA+CACI-driven global semantic extraction and the adaptive DSS that balances geographic relevance with feature diversity. These innovations enable precise UAV self-positioning in GPS-denied settings and offer strong generalization to cross-view geo-localization tasks beyond UAVs.

Abstract

Image retrieval has been employed as a robust complementary technique to address the challenge of Unmanned Aerial Vehicles (UAVs) self-positioning. However, most existing methods primarily focus on localizing objects captured by UAVs through complex part-based representations, often overlooking the unique challenges associated with UAV self-positioning, such as fine-grained spatial discrimination requirements and dynamic scene variations. To address the above issues, we propose the Context-Enhanced method for precise UAV Self-Positioning (CEUSP), specifically designed for UAV self-positioning tasks. CEUSP integrates a Dynamic Sampling Strategy (DSS) to efficiently select optimal negative samples, while the Rubik's Cube Attention (RCA) module, combined with the Context-Aware Channel Integration (CACI) module, enhances feature representation and discrimination by exploiting interdimensional interactions, inspired by the rotational mechanics of a Rubik's Cube. Extensive experimental validate the effectiveness of the proposed method, demonstrating notable improvements in feature representation and UAV self-positioning accuracy within complex urban environments. Our approach achieves state-of-the-art performance on the DenseUAV dataset, which is specifically designed for dense urban contexts, and also delivers competitive results on the widely recognized University-1652 benchmark.

Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization

TL;DR

CEUSP addresses GPS-denied UAV self-positioning by reframing it as cross-view geo-localization and contributions a multi-component framework: Rubik's Cube Attention (RCA) for multi-dimensional feature interaction, Context-Aware Channel Integration (CACI) for cross-dimensional attention, and a Dynamic Sampling Strategy (DSS) to curate challenging negatives. Built on a ConvNeXt-T backbone, CEUSP jointly optimizes representation, metric, and mutual learning losses, , , and , achieving state-of-the-art results on DenseUAV (, ) and competitive performance on University-1652. The method demonstrates robust urban localization under dense sampling and spatial perturbations, thanks to RCA+CACI-driven global semantic extraction and the adaptive DSS that balances geographic relevance with feature diversity. These innovations enable precise UAV self-positioning in GPS-denied settings and offer strong generalization to cross-view geo-localization tasks beyond UAVs.

Abstract

Image retrieval has been employed as a robust complementary technique to address the challenge of Unmanned Aerial Vehicles (UAVs) self-positioning. However, most existing methods primarily focus on localizing objects captured by UAVs through complex part-based representations, often overlooking the unique challenges associated with UAV self-positioning, such as fine-grained spatial discrimination requirements and dynamic scene variations. To address the above issues, we propose the Context-Enhanced method for precise UAV Self-Positioning (CEUSP), specifically designed for UAV self-positioning tasks. CEUSP integrates a Dynamic Sampling Strategy (DSS) to efficiently select optimal negative samples, while the Rubik's Cube Attention (RCA) module, combined with the Context-Aware Channel Integration (CACI) module, enhances feature representation and discrimination by exploiting interdimensional interactions, inspired by the rotational mechanics of a Rubik's Cube. Extensive experimental validate the effectiveness of the proposed method, demonstrating notable improvements in feature representation and UAV self-positioning accuracy within complex urban environments. Our approach achieves state-of-the-art performance on the DenseUAV dataset, which is specifically designed for dense urban contexts, and also delivers competitive results on the widely recognized University-1652 benchmark.

Paper Structure

This paper contains 20 sections, 8 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: The Dynamic Sampling Strategy. Purple samples are selected from images exhibiting geographic proximity, while blue samples are derived from feature similarities identified between drone and satellite imagery. Within each sampling batch, the ratio of these two sample types is adjusted progressively as the training process advances.
  • Figure 2: Overview of our CEUSP framework. The framework integrates a Dynamic Sampling Strategy (DSS) to prioritize difficult negative samples during training. The Rubik's Cube Attention (RCA) module, combined with the Context-Aware Channel Integration (CACI) module, captures spatial-channel interactions, with weight sharing applied selectively to enhance performance. During testing, features extracted before the classification layer are matched using cosine similarity.
  • Figure 3: Comparison of R@1 and SDM@K metrics. SDM@K, normalized between 0 and 1, evaluates the spatial Euclidean distance between query and gallery images, assigning higher weights to closer matches. This metric balances retrieval accuracy with spatial precision, tolerating minor deviations while penalizing larger errors, making it particularly suitable for UAV self-positioning tasks.
  • Figure 4: Heatmaps generated by MCCG and CEUSP. CEUSP shows a greater focus on significant landmarks, while MCCG emphasizes central features. CEUSP effectively highlights critical elements, such as the pavilion and central intersection, demonstrating enhanced scene comprehension.
  • Figure 5: Illustration of Black Pad and Flip Pad methods. (a) Original drone image. (b) Image with a black padding block on the left and a corresponding strip removed from the right. (c) Image created by mirroring a strip on the left and cropping a strip from the right. (d) Original satellite image.
  • ...and 1 more figures