Table of Contents
Fetching ...

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao, Xiangyu Zhao, Sharon Li

TL;DR

This work tackles worldwide image geolocalization by moving beyond independent query–candidate embeddings to a distance-aware ranking paradigm. The authors propose GeoRanker, a LVLM-based framework that jointly encodes query–candidate interactions and learns a multi-order distance objective, combining absolute and relative distance supervision to capture spatial structure. To support training, they introduce GeoRanking, the first dataset explicitly designed for geographic ranking with multimodal candidate information, enabling rich spatial reasoning. Empirical results on IM2GPS3K and YFCC4K show state-of-the-art performance with substantial gains at street-level and across coarse scales, along with strong ablations, efficiency analyses, and evidence of scaling benefits with larger backbones. The approach promises practical impact for global geolocalization tasks and opens avenues for further integration of LVLMs and structured spatial reasoning in geo-aware AI.

Abstract

Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

TL;DR

This work tackles worldwide image geolocalization by moving beyond independent query–candidate embeddings to a distance-aware ranking paradigm. The authors propose GeoRanker, a LVLM-based framework that jointly encodes query–candidate interactions and learns a multi-order distance objective, combining absolute and relative distance supervision to capture spatial structure. To support training, they introduce GeoRanking, the first dataset explicitly designed for geographic ranking with multimodal candidate information, enabling rich spatial reasoning. Empirical results on IM2GPS3K and YFCC4K show state-of-the-art performance with substantial gains at street-level and across coarse scales, along with strong ablations, efficiency analyses, and evidence of scaling benefits with larger backbones. The approach promises practical impact for global geolocalization tasks and opens avenues for further integration of LVLMs and structured spatial reasoning in geo-aware AI.

Abstract

Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

Paper Structure

This paper contains 28 sections, 7 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Accuracy at 1km error threshold for G3 (Current SOTA) vs. the best candidate within top‑k retrieved results.
  • Figure 2: Overview of the Distance-aware Ranking framework--GeoRanker.
  • Figure 3: Hyperparameter analysis at the region level on IM2GPS3K. Trends observed at the region level are representative across different geographic levels. Results for all hyperparameters across all levels can be found in Appendix \ref{['sec:appendix_hyper_analysis']}.
  • Figure 3: Comparison with other ranking baselines.
  • Figure 4: Time efficiency.
  • ...and 7 more figures