Table of Contents
Fetching ...

PEnG: Pose-Enhanced Geo-Localisation

Tavis Shore, Oscar Mendez, Simon Hadfield

TL;DR

PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies, and performs relative pose estimation within these edges to determine a precise position, is developed.

Abstract

Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve accuracy greater than the average separation of the tiles. To solve this limitation, we propose combining cross-view geo-localisation and relative pose estimation to increase precision to a level practical for real-world application. We develop PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies. It then performs relative pose estimation within these edges to determine a precise position. PEnG presents the first technique to utilise both viewpoints available within cross-view geo-localisation datasets to enhance precision to a sub-metre level, with some examples achieving centimetre level accuracy. Our proposed ensemble achieves state-of-the-art precision - with relative Top-5m retrieval improvements on previous works of 213%. Decreasing the median euclidean distance error by 96.90% from the previous best of 734m down to 22.77m, when evaluating with 90 degree horizontal FOV images. Code will be made available: tavisshore.co.uk/PEnG

PEnG: Pose-Enhanced Geo-Localisation

TL;DR

PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies, and performs relative pose estimation within these edges to determine a precise position, is developed.

Abstract

Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve accuracy greater than the average separation of the tiles. To solve this limitation, we propose combining cross-view geo-localisation and relative pose estimation to increase precision to a level practical for real-world application. We develop PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies. It then performs relative pose estimation within these edges to determine a precise position. PEnG presents the first technique to utilise both viewpoints available within cross-view geo-localisation datasets to enhance precision to a sub-metre level, with some examples achieving centimetre level accuracy. Our proposed ensemble achieves state-of-the-art precision - with relative Top-5m retrieval improvements on previous works of 213%. Decreasing the median euclidean distance error by 96.90% from the previous best of 734m down to 22.77m, when evaluating with 90 degree horizontal FOV images. Code will be made available: tavisshore.co.uk/PEnG

Paper Structure

This paper contains 17 sections, 5 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 2: paper_name Stages: 1) City-scale satellite image with underlying graph network, CVGL estimates candidate edges within city's graph. 2) Pose estimation along these edges achieves refined geographic poses. Green denotes a query input, blue and red display two known reference images.
  • Figure 3: Section of Manhattan graph with primary (orange) and secondary (blue) nodes displayed. Most edges have a constant yaw, motivating the utilisation of a compass.
  • Figure 4: Example primary node (road junction) cross-view image pairs. Left-hand side shows $90\degree$ crops from panoramas and the right-hand side shows aerial images at zoom 20.
  • Figure 5: 2-Stage system diagram. Stage 1 retrieves scaled similarities of reference embeddings for the latest seen primary node, acquiring ordered candidate edges. Stage 2 runs through edges consecutively until a threshold is met or completion. Position along an edge is estimated against all reference images, then estimating pose with the predicted adjacent two images.
  • Figure 6: Pose estimates within each candidate edge are scored by their 3-axis euclidean distance with the mean rotational pose of the secondary nodes. This is possible due to the known orientations of edges within graph representations.
  • ...and 1 more figures