Table of Contents
Fetching ...

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

Boni Hu, Lin Chen, Runjian Chen, Shuhui Bu, Pengcheng Han, Haowei Li

TL;DR

The approach, termed CurriculumLoc, involves a delicate design of multistage refinement pipeline and a novel keypoint detection and description with global semantic awareness and local geometric verification, which results in the aforementioned desirable characteristics of a practical visual geolocalization solution.

Abstract

Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images, taken at some unknown location, to a set of geo-tagged reference images. Existing methods, devoted to semantic features representation, evolving towards robustness to a wide variety between query and reference, including illumination and viewpoint changes, as well as scale and seasonal variations. However, practical visual geolocalization approaches need to be robust in appearance changing and extreme viewpoint variation conditions, while providing accurate global location estimates. Therefore, inspired by curriculum design, human learn general knowledge first and then delve into professional expertise. We first recognize semantic scene and then measure geometric structure. Our approach, termed CurriculumLoc, involves a delicate design of multi-stage refinement pipeline and a novel keypoint detection and description with global semantic awareness and local geometric verification. We rerank candidates and solve a particular cross-domain perspective-n-point (PnP) problem based on these keypoints and corresponding descriptors, position refinement occurs incrementally. The extensive experimental results on our collected dataset, TerraTrack and a benchmark dataset, ALTO, demonstrate that our approach results in the aforementioned desirable characteristics of a practical visual geolocalization solution. Additionally, we achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively. Dataset, code and trained models are publicly available on https://github.com/npupilab/CurriculumLoc.

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

TL;DR

The approach, termed CurriculumLoc, involves a delicate design of multistage refinement pipeline and a novel keypoint detection and description with global semantic awareness and local geometric verification, which results in the aforementioned desirable characteristics of a practical visual geolocalization solution.

Abstract

Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images, taken at some unknown location, to a set of geo-tagged reference images. Existing methods, devoted to semantic features representation, evolving towards robustness to a wide variety between query and reference, including illumination and viewpoint changes, as well as scale and seasonal variations. However, practical visual geolocalization approaches need to be robust in appearance changing and extreme viewpoint variation conditions, while providing accurate global location estimates. Therefore, inspired by curriculum design, human learn general knowledge first and then delve into professional expertise. We first recognize semantic scene and then measure geometric structure. Our approach, termed CurriculumLoc, involves a delicate design of multi-stage refinement pipeline and a novel keypoint detection and description with global semantic awareness and local geometric verification. We rerank candidates and solve a particular cross-domain perspective-n-point (PnP) problem based on these keypoints and corresponding descriptors, position refinement occurs incrementally. The extensive experimental results on our collected dataset, TerraTrack and a benchmark dataset, ALTO, demonstrate that our approach results in the aforementioned desirable characteristics of a practical visual geolocalization solution. Additionally, we achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively. Dataset, code and trained models are publicly available on https://github.com/npupilab/CurriculumLoc.
Paper Structure (33 sections, 18 equations, 15 figures, 4 tables)

This paper contains 33 sections, 18 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Visualization of pixel correspondences supervision and soft detection score of matched images during training. White represents low soft-detection scores while red signifies higher ones. The training lowers the soft-detection scores on repetitive structures (e.g. ground, floor, walls) while it enhances the score on more distinctive points. And during training soft detection scores are optimized under the supervision of pixel correspondences.
  • Figure 2: (a): The detailed pipeline of proposed CurriculumLoc. (b): Details of Swin-Descriptors in (a). (c): The schematic of cross-domian PnP in (a).
  • Figure 3: The architecture of Swin-Descriptors, which is composed of encoder, bottleneck, decoder and skip connections. Encoder, bottleneck and decoder are all constructed based on swin transformer block.
  • Figure 4: Swin transformer block
  • Figure 5: Comparison results of retrieval recall@1 with dist=20$m$.
  • ...and 10 more figures