Table of Contents
Fetching ...

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

Yang Chen, Xieyuanli Chen, Junxiang Li, Jie Tang, Tao Wu

TL;DR

SinGeo is presented, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations, and is the first to introduce a curriculum learning strategy to achieve robust CVGL.

Abstract

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are optimized under a fixed FoV but collapse when tested on unseen FoVs and unknown orientations. This limitation necessitates deploying multiple models to cover diverse variations. Although studies have explored dynamic FoV training by simply randomizing FoVs, they failed to achieve robustness across diverse conditions -- implicitly assuming all FoVs are equally difficult. To address this gap, we present SinGeo, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations. SinGeo employs a dual discriminative learning architecture that enhances intra-view discriminability within both ground and satellite branches, and is the first to introduce a curriculum learning strategy to achieve robust CVGL. Extensive evaluations on four benchmark datasets reveal that SinGeo sets state-of-the-art (SOTA) results under diverse conditions, and notably outperforms methods specifically trained for extreme FoVs. Beyond superior performance, SinGeo also exhibits cross-architecture transferability. Furthermore, we propose a consistency evaluation method to quantitatively assess model stability under varying views, providing an explainable perspective for understanding and advancing robustness in future CVGL research. Codes will be available upon acceptance.

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

TL;DR

SinGeo is presented, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations, and is the first to introduce a curriculum learning strategy to achieve robust CVGL.

Abstract

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are optimized under a fixed FoV but collapse when tested on unseen FoVs and unknown orientations. This limitation necessitates deploying multiple models to cover diverse variations. Although studies have explored dynamic FoV training by simply randomizing FoVs, they failed to achieve robustness across diverse conditions -- implicitly assuming all FoVs are equally difficult. To address this gap, we present SinGeo, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations. SinGeo employs a dual discriminative learning architecture that enhances intra-view discriminability within both ground and satellite branches, and is the first to introduce a curriculum learning strategy to achieve robust CVGL. Extensive evaluations on four benchmark datasets reveal that SinGeo sets state-of-the-art (SOTA) results under diverse conditions, and notably outperforms methods specifically trained for extreme FoVs. Beyond superior performance, SinGeo also exhibits cross-architecture transferability. Furthermore, we propose a consistency evaluation method to quantitatively assess model stability under varying views, providing an explainable perspective for understanding and advancing robustness in future CVGL research. Codes will be available upon acceptance.
Paper Structure (32 sections, 9 equations, 8 figures, 13 tables)

This paper contains 32 sections, 9 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Comparison of different training paradigms for Cross-view Geo-localization. SinGeo achieves superior overall performance and better transferability under scenarios of both unknown orientation and different limited field of views. Previous methods including DSM dsm, SEH seh, SAIG-D saig, and ConGeo congeo are reported with their Top-1 Recall performance on CVUSA cvusa dataset.
  • Figure 2: The proposed dual discriminative learning architecture. “Dual” denotes the two-branch design that enhances intra-view discriminativeness through self-supervision in both ground and satellite branches. Specifically, each branch applies specific transformations to generate $I_g^*$ and $I_s^*$. The learning objective combines contrastive losses for intra-view discrimination and cross-view alignment.
  • Figure 3: Illustration of the curriculum learning framework and its inspiration. Left: An intuitive analogy of curriculum learning, where a freshman gradually refines geo-localization ability when he gets more familiar and practiced. Right:. The predetermined curriculum schedules the difficulties according to the epoch $t$. After updated by dual discriminative learning, the encoder is then fed into Dynamic Similarity Sampling block sample4geo to generate negatives for next epoch.
  • Figure 4: Visualization of consistency comparison among SinGeo, ConGeo and Sample4Geo on the CVUSA dataset. The left column depicts a sample under varying orientations, where SinGeo exhibits stronger consistency in both ground and satellite branches. The right column presents a sample under different FoVs, where the activation heatmaps of SinGeo remain stable across varying views.
  • Figure 5: Qualitative evaluation on a sample of CVUSA dataset. Left: A query image under varied orientations and FoVs, together with its ground-truth satellite image. Right: Top-1 retrieved images and activated heatmap regions of different methods. Green circles denote the regions on the satellite image that correspond to the limited-FoV ground images.
  • ...and 3 more figures