Table of Contents
Fetching ...

QdaVPR: A novel query-based domain-agnostic model for visual place recognition

Shanshan Wan, Lai Kang, Yingmei Wei, Tianrui Shen, Haixuan Wang, Chao Zuo

TL;DR

A novel query-based domain-agnostic VPR model called QdaVPR is proposed, which achieves state-of-the-art performance on multiple VPR benchmarks with significant domain variations and has the best Recall@1 and Recall@10 on nearly all test scenarios.

Abstract

Visual place recognition (VPR) aiming at predicting the location of an image based solely on its visual features is a fundamental task in robotics and autonomous systems. Domain variation remains one of the main challenges in VPR and is relatively unexplored. Existing VPR models attempt to achieve domain agnosticism either by training on large-scale datasets that inherently contain some domain variations, or by being specifically adapted to particular target domains. In practice, the former lacks explicit domain supervision, while the latter generalizes poorly to unseen domain shifts. This paper proposes a novel query-based domain-agnostic VPR model called QdaVPR. First, a dual-level adversarial learning framework is designed to encourage domain invariance for both the query features forming the global descriptor and the image features from which these query features are derived. Then, a triplet supervision based on query combinations is designed to enhance the discriminative power of the global descriptors. To support the learning process, we augment a large-scale VPR dataset using style transfer methods, generating various synthetic domains with corresponding domain labels as auxiliary supervision. Extensive experiments show that QdaVPR achieves state-of-the-art performance on multiple VPR benchmarks with significant domain variations. Specifically, it attains the best Recall@1 and Recall@10 on nearly all test scenarios: 93.5%/98.6% on Nordland (seasonal changes), 97.5%/99.0% on Tokyo24/7 (day-night transitions), and the highest Recall@1 across almost all weather conditions on the SVOX dataset. Our code will be released at https://github.com/shuimushan/QdaVPR.

QdaVPR: A novel query-based domain-agnostic model for visual place recognition

TL;DR

A novel query-based domain-agnostic VPR model called QdaVPR is proposed, which achieves state-of-the-art performance on multiple VPR benchmarks with significant domain variations and has the best Recall@1 and Recall@10 on nearly all test scenarios.

Abstract

Visual place recognition (VPR) aiming at predicting the location of an image based solely on its visual features is a fundamental task in robotics and autonomous systems. Domain variation remains one of the main challenges in VPR and is relatively unexplored. Existing VPR models attempt to achieve domain agnosticism either by training on large-scale datasets that inherently contain some domain variations, or by being specifically adapted to particular target domains. In practice, the former lacks explicit domain supervision, while the latter generalizes poorly to unseen domain shifts. This paper proposes a novel query-based domain-agnostic VPR model called QdaVPR. First, a dual-level adversarial learning framework is designed to encourage domain invariance for both the query features forming the global descriptor and the image features from which these query features are derived. Then, a triplet supervision based on query combinations is designed to enhance the discriminative power of the global descriptors. To support the learning process, we augment a large-scale VPR dataset using style transfer methods, generating various synthetic domains with corresponding domain labels as auxiliary supervision. Extensive experiments show that QdaVPR achieves state-of-the-art performance on multiple VPR benchmarks with significant domain variations. Specifically, it attains the best Recall@1 and Recall@10 on nearly all test scenarios: 93.5%/98.6% on Nordland (seasonal changes), 97.5%/99.0% on Tokyo24/7 (day-night transitions), and the highest Recall@1 across almost all weather conditions on the SVOX dataset. Our code will be released at https://github.com/shuimushan/QdaVPR.
Paper Structure (17 sections, 9 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 9 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the domain-agnostic VPR models. (a) Most models acquire some domain agnosticism through training on large-scale datasets. (b) Some VPR models are specially designed to be robust to target domains but generalize poorly to unseen domains. (c) Our model achieves domain agnosticism via a novel dual-level adversarial learning framework, which makes it more robust to VPR tasks under domain shift.
  • Figure 2: Overview of the QdaVPR model. (a) The base architecture. Regardless of whether the input image is from the original GSV-cities or the generated six-domain synthetic GSV-cities datasets, this architecture outputs a global descriptor and $N_c$ query combinations. (b) The framework for dual-level adversarial learning. Each of the $L \times M$ query features and the $L$ domain features (extracted from the $L$ image features) is fed into a domain discriminator, producing output logits with six values. These values predict the specific domain of the input image. Note that output logits are generated only when the input image is from the generated six-domain dataset; otherwise, the outputs are as shown in (a). During inference, the red blocks are discarded, and only the global descriptor is output. The channel dimension is omitted from the figure for clarity. See text for details.
  • Figure 3: Mutual reinforcement in the dual-level adversarial learning framework. Query features improve image features' domain agnosticism through negative gradient flow, and image features reciprocally produce domain-agnostic query features via the framework flow. The symbols are defined in the text.
  • Figure 4: Average attention map visualization for (b) QdaVPR and (c) BoQ BoQ. In the input images (a), green blocks indicate regions containing buildings. In (b) and (c), green blocks denote consistently high attention to the buildings shown in (a) across different weather conditions, whereas red blocks indicate high attention under only one specific weather condition and low attention under the other.