Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization
Yuanze Xu, Ming Dai, Wenxiao Cai, Wankou Yang
TL;DR
CEUSP addresses GPS-denied UAV self-positioning by reframing it as cross-view geo-localization and contributions a multi-component framework: Rubik's Cube Attention (RCA) for multi-dimensional feature interaction, Context-Aware Channel Integration (CACI) for cross-dimensional attention, and a Dynamic Sampling Strategy (DSS) to curate challenging negatives. Built on a ConvNeXt-T backbone, CEUSP jointly optimizes representation, metric, and mutual learning losses, $\mathcal{L}_{rpt}$, $\mathcal{L}_{mtc}$, and $\mathcal{L}_{kl}$, achieving state-of-the-art results on DenseUAV ($R@1=89.45\%$, $AP=79.62\%$) and competitive performance on University-1652. The method demonstrates robust urban localization under dense sampling and spatial perturbations, thanks to RCA+CACI-driven global semantic extraction and the adaptive DSS that balances geographic relevance with feature diversity. These innovations enable precise UAV self-positioning in GPS-denied settings and offer strong generalization to cross-view geo-localization tasks beyond UAVs.
Abstract
Image retrieval has been employed as a robust complementary technique to address the challenge of Unmanned Aerial Vehicles (UAVs) self-positioning. However, most existing methods primarily focus on localizing objects captured by UAVs through complex part-based representations, often overlooking the unique challenges associated with UAV self-positioning, such as fine-grained spatial discrimination requirements and dynamic scene variations. To address the above issues, we propose the Context-Enhanced method for precise UAV Self-Positioning (CEUSP), specifically designed for UAV self-positioning tasks. CEUSP integrates a Dynamic Sampling Strategy (DSS) to efficiently select optimal negative samples, while the Rubik's Cube Attention (RCA) module, combined with the Context-Aware Channel Integration (CACI) module, enhances feature representation and discrimination by exploiting interdimensional interactions, inspired by the rotational mechanics of a Rubik's Cube. Extensive experimental validate the effectiveness of the proposed method, demonstrating notable improvements in feature representation and UAV self-positioning accuracy within complex urban environments. Our approach achieves state-of-the-art performance on the DenseUAV dataset, which is specifically designed for dense urban contexts, and also delivers competitive results on the widely recognized University-1652 benchmark.
