Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone
Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele Facciolo
TL;DR
<3-5 sentence high-level summary> The paper tackles GNSS-denied UAV localization by eliminating the need for paired UAV-reference training data. It introduces CAEVL, a lightweight, edge-based autoencoder that uses perceptual loss and a non-contrastive VICRegL fine-tuning stage to learn domain-invariant embeddings from satellite-reference imagery alone, along with a challenging high-altitude UAV benchmark, ViLD. ViLD comprises real UAV flights and large sets of satellite-derived reference crops, capturing vignetting and non-nadir views up to 1600 m altitude, and is released to the community. Results show CAEVL achieves competitive localization accuracy with far lower computational cost compared to fully supervised methods, demonstrating strong generalization and robustness to cross-view shifts; the work also provides extensive ablations and robustness analyses to validate the approach.
Abstract
Image-based localization in GNSS-denied environments is critical for UAV autonomy. Existing state-of-the-art approaches rely on matching UAV images to geo-referenced satellite images; however, they typically require large-scale, paired UAV-satellite datasets for training. Such data are costly to acquire and often unavailable, limiting their applicability. To address this challenge, we adopt a training paradigm that removes the need for UAV imagery during training by learning directly from satellite-view reference images. This is achieved through a dedicated augmentation strategy that simulates the visual domain shift between satellite and real-world UAV views. We introduce CAEVL, an efficient model designed to exploit this paradigm, and validate it on ViLD, a new and challenging dataset of real-world UAV images that we release to the community. Our method achieves competitive performance compared to approaches trained with paired data, demonstrating its effectiveness and strong generalization capabilities.
