Table of Contents
Fetching ...

Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Yannis Kalantidis, Mert Bülent Sarıyıldız, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

TL;DR

This paper tackles the vulnerability of visual localization to appearance changes by reforming the retrieval step through Ret4Loc, which augments training with language-guided synthetic variants generated by text-to-image models and enforces geometry-aware constraints. The core approach combines HOW-based landmark retrieval with domain randomization, 11 targeted synthetic prompts, and a geometry-consistency framework to filter and sample synthetic data. Empirical results across outdoor and indoor benchmarks show consistent improvements over state-of-the-art retrieval methods, including strong gains when using aggregated variant losses and geometry-aware sampling, and establishing new baselines on several datasets. The work demonstrates the practical value of language-guided data augmentation for robust localization and suggests pathways for efficient, future-ready data synthesis and validation in perception pipelines.

Abstract

State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. After expanding the training set, we propose a training approach that leverages the specificities and the underlying geometry of this mix of real and synthetic images. We experimentally show that those changes translate into large improvements for the most challenging visual localization datasets. Project page: https://europe.naverlabs.com/ret4loc

Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

TL;DR

This paper tackles the vulnerability of visual localization to appearance changes by reforming the retrieval step through Ret4Loc, which augments training with language-guided synthetic variants generated by text-to-image models and enforces geometry-aware constraints. The core approach combines HOW-based landmark retrieval with domain randomization, 11 targeted synthetic prompts, and a geometry-consistency framework to filter and sample synthetic data. Empirical results across outdoor and indoor benchmarks show consistent improvements over state-of-the-art retrieval methods, including strong gains when using aggregated variant losses and geometry-aware sampling, and establishing new baselines on several datasets. The work demonstrates the practical value of language-guided data augmentation for robust localization and suggests pathways for efficient, future-ready data synthesis and validation in perception pipelines.

Abstract

State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. After expanding the training set, we propose a training approach that leverages the specificities and the underlying geometry of this mix of real and synthetic images. We experimentally show that those changes translate into large improvements for the most challenging visual localization datasets. Project page: https://europe.naverlabs.com/ret4loc
Paper Structure (30 sections, 2 equations, 15 figures, 7 tables)

This paper contains 30 sections, 2 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Gains in localization accuracy using our Ret4Loc models compared to the state of the art (black dot). We show results for our best models trained on only real (Ret4Loc) or real and synthetic images (Ret4Loc + Synth), for 7 outdoor and 1 indoor dataset splits. Axes in log-scale.
  • Figure 2: (Left) Synthetic variants for several prompts (the full set of variants is shown in Fig. \ref{['fig:all_variants']}). (Right) Estimated local correspondences between two matching images before and after alteration.
  • Figure 3: Localization accuracy as a function of the top-$k$ retrieved images for Ret4Loc models and the state of the art. Top: Pose approximation (EWB) protocol. Bottom: Structure-from-Motion (SfM) based protocol. Ret4Loc-HOW-Synth variations using geometric consistency are denoted with a "+". In each plot, we further denote the top gains achieved using Ret4Loc models over the best performing competing method. See \ref{['sec:app_extended_results']} for a complete set of results on more datasets.
  • Figure 4: (Left) Percentage of synthetic pairs dropped for different thresholds $\tau$ on the geometric consistency score $s$. (Right) Percentage of synthetic pairs dropped per textual prompt for $\tau=0.5$.
  • Figure 5: Relative gains on Robotcar-v2 for different $\tau$ values for Ret4Loc-HOW-Synth+ (solid lines) and Ret4Loc-HOW-Synth++ (dashed lines) for localization across three error thresholds.
  • ...and 10 more figures