Table of Contents
Fetching ...

SpaGBOL: Spatial-Graph-Based Orientated Localisation

Tavis Shore, Oscar Mendez, Simon Hadfield

TL;DR

SpaGBOL redefines cross-view geo-localisation as a graph-structured problem, enabling robust learning from geospatially organized sequences and generation of unseen routes in urban environments. It introduces a dual-branch neural network paired with a GNN to produce spatially strong embeddings and a bearing-based retrieval filter, achieving state-of-the-art results on an unseen city graph. A dense, multi-city graph dataset with multiple streetview images per node is released to foster generalisation under time, weather, and viewpoint variations. The approach addresses GNSS-denied localisation in urban canyons, demonstrating practical potential for real-world robotic and navigation systems, especially when combined with bearing vector matching and yaw cues. Limitations include localisation limited to road junctions, with future work proposing hierarchical sub-graphs and sensor fusion for finer-grained positioning.

Abstract

Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. To leverage this newly available information, we propose a GNN-based architecture, producing spatially strong embeddings and improving discriminability over isolated image embeddings. We outline SpaGBOL, introducing three novel contributions. 1) The first graph-structured dataset for Cross-View Geo-Localisation, containing multiple streetview images per node to improve generalisation. 2) Introducing GNNs to the problem, we develop the first system that exploits the correlation between node proximity and feature similarity. 3) Leveraging the unique properties of the graph representation - we demonstrate a novel retrieval filtering approach based on neighbourhood bearings. SpaGBOL achieves state-of-the-art accuracies on the unseen test graph - with relative Top-1 retrieval improvements on previous techniques of 11%, and 50% when filtering with Bearing Vector Matching on the SpaGBOL dataset.

SpaGBOL: Spatial-Graph-Based Orientated Localisation

TL;DR

SpaGBOL redefines cross-view geo-localisation as a graph-structured problem, enabling robust learning from geospatially organized sequences and generation of unseen routes in urban environments. It introduces a dual-branch neural network paired with a GNN to produce spatially strong embeddings and a bearing-based retrieval filter, achieving state-of-the-art results on an unseen city graph. A dense, multi-city graph dataset with multiple streetview images per node is released to foster generalisation under time, weather, and viewpoint variations. The approach addresses GNSS-denied localisation in urban canyons, demonstrating practical potential for real-world robotic and navigation systems, especially when combined with bearing vector matching and yaw cues. Limitations include localisation limited to road junctions, with future work proposing hierarchical sub-graphs and sensor fusion for finer-grained positioning.

Abstract

Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. To leverage this newly available information, we propose a GNN-based architecture, producing spatially strong embeddings and improving discriminability over isolated image embeddings. We outline SpaGBOL, introducing three novel contributions. 1) The first graph-structured dataset for Cross-View Geo-Localisation, containing multiple streetview images per node to improve generalisation. 2) Introducing GNNs to the problem, we develop the first system that exploits the correlation between node proximity and feature similarity. 3) Leveraging the unique properties of the graph representation - we demonstrate a novel retrieval filtering approach based on neighbourhood bearings. SpaGBOL achieves state-of-the-art accuracies on the unseen test graph - with relative Top-1 retrieval improvements on previous techniques of 11%, and 50% when filtering with Bearing Vector Matching on the SpaGBOL dataset.
Paper Structure (16 sections, 9 equations, 8 figures, 3 tables)

This paper contains 16 sections, 9 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: At inference time, a KDTree is constructed from exhaustive reference walks sampled from the city's graph. A randomly selected query walk passes through the network, retrieving corresponding embeddings from the KDTree ordered in descending similarity. These are further filtered to the set of compatible nodes with bvm.
  • Figure 2: paper_name is a two-branch neural network with no weight-sharing, from left to right the network performs the following actions: (1) Image feature extraction with ConvNext-T, (2) Depth-first walk image features $\rightarrow$ GNN embedding (red), (3) Produce neighbour bearing vectors, (4) Perform embedding retrieval from the KDTree, (5) Filter retrievals with bearings to return final geo-coordinates.
  • Figure 3: Corpus graph of London City Centre. Each graph is square with sides of length 2km. Nodes (junctions) are shown here in blue, with black edges (roads).
  • Figure 4: Random depth-first walk sample of length 3. Image features are extracted from each node, passing through a GNN to produce the final node embedding.
  • Figure 5: Splitting corpus graphs into train/validation/test sets. Validation graphs are unconnected subgraphs of each training graphs. The test graph is a wholly unseen city graph.
  • ...and 3 more figures