GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

Angel Daruna; Nicholas Meegan; Han-Pang Chiu; Supun Samarasekera; Rakesh Kumar

GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

Angel Daruna, Nicholas Meegan, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

TL;DR

GeoSURGE addresses global image geo-localization by learning a hierarchical geographic embedding and enriching visual representations through semantic fusion. It models geography as a partitioned hierarchy of geocells, each represented by a learnable embedding, and trains via contrastive learning to align query visuals with geographic features. A semantic fusion module using latent cross-attention combines RGB appearance with semantic segmentation to produce a robust visual representation. Empirically, GeoSURGE achieves state-of-the-art results on 22 of 25 metrics across five benchmarks, underscoring the value of hierarchical geographic representations and semantic augmentation for precise geo-localization.

Abstract

Worldwide visual geo-localization seeks to determine the geographic location of an image anywhere on Earth using only its visual content. Learned representations of geography for visual geo-localization remain an active research topic despite much progress. We formulate geo-localization as aligning the visual representation of the query image with a learned geographic representation. Our novel geographic representation explicitly models the world as a hierarchy of geographic embeddings. Additionally, we introduce an approach to efficiently fuse the appearance features of the query image with its semantic segmentation map, forming a robust visual representation. Our main experiments demonstrate improved all-time bests in 22 out of 25 metrics measured across five benchmark datasets compared to prior state-of-the-art (SOTA) methods and recent Large Vision-Language Models (LVLMs). Additional ablation studies support the claim that these gains are primarily driven by the combination of geographic and visual representations.

GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

TL;DR

Abstract

GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)