ConGeo: Robust Cross-view Geo-localization across Ground View Variations

Li Mi; Chang Xu; Javiera Castillo-Navarro; Syrielle Montariol; Wen Yang; Antoine Bosselut; Devis Tuia

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

Li Mi, Chang Xu, Javiera Castillo-Navarro, Syrielle Montariol, Wen Yang, Antoine Bosselut, Devis Tuia

TL;DR

ConGeo tackles cross-view geo-localization under unknown ground-view orientation and limited FoV by introducing a model-agnostic contrastive learning framework. It uses complementary single-view and cross-view losses to align ground-view variations with their original representations and with aerial references, enabling a single model to handle diverse ground-view configurations. Across four CVGL benchmarks and multiple base architectures, ConGeo delivers substantial improvements over orientation- or FoV-specific methods and demonstrates robustness to unseen variations. Analyses show that ConGeo reduces reliance on geometric shortcuts and emphasizes semantically consistent features, boosting practical applicability in real-world navigation and localization tasks.

Abstract

Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. In real-world scenarios, the task requires accommodating diverse ground images captured by users with varying orientations and reduced field of views (FoVs). However, existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations. Such models heavily depend on the North-aligned spatial correspondence and predefined FoVs in the training data, compromising their robustness across different settings. To tackle this challenge, we propose ConGeo, a single- and cross-view Contrastive method for Geo-localization: it enhances robustness and consistency in feature representations to improve a model's invariance to orientation and its resilience to FoV variations, by enforcing proximity between ground view variations of the same location. As a generic learning objective for cross-view geo-localization, when integrated into state-of-the-art pipelines, ConGeo significantly boosts the performance of three base models on four geo-localization benchmarks for diverse ground view variations and outperforms competing methods that train separate models for each ground view variation.

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

TL;DR

Abstract

Paper Structure (41 sections, 6 equations, 11 figures, 19 tables)

This paper contains 41 sections, 6 equations, 11 figures, 19 tables.

Introduction
Related Works
Cross-view Geo-localization
Contrastive Learning for Geo-localization
ConGeo
Overview
Learning Objectives
Experiments
Datasets and Evaluation Metrics
Implementation Details
Experimental settings
Results
Performance under Different Settings
Ablations
Adaptability to Different Base Models
...and 26 more sections

Figures (11)

Figure 1: ConGeo boosts the robustness across ground view variations: North-aligned, unknown orientation (FoV=360$^{\circ}$) and limited field of views (FoV=70$^{\circ}$, 90$^{\circ}$, and 180$^{\circ}$). We compare with SEH guo2022softSEH, DSM jointloc_2020_cvpr and SAIG-D zhu2023simplesaigd and report Top-1 Recall on the CVUSA cvusa_cvpr_2015 dataset, one of the geo-localization benchmarks.
Figure 2: ConGeo's learning pipeline. For feature representation in the left and right boxes, the North-aligned ground image ($I_q$), the transformed ground image ($I^{*}_q$), and the aerial view ($I_r$) are sent to their respective encoders. Then in the feature space, the single- and cross-view contrastive learning losses are applied to enforce the proximity of the paired images.
Figure 2: Comparison of the North-aligned setting on CVUSA and CVACT datasets. The second-best performance is underlined. "-" means the score is not provided in the original paper.
Figure 3: Examples of the top-4 retrieved images from ConGeo and the baseline when FoV=90$^{\circ}$. Images in the orange box denote the correct results.
Figure 4: ConGeo shows better orientation invariance. We cyclically shift the ground view with an angle (x-axis) as the model's input to test its retrieval performance. Note that "N-A" denotes the North-aligned setting and "DA" means data augmentation.
...and 6 more figures

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

TL;DR

Abstract

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)