Table of Contents
Fetching ...

AIR-HLoc: Adaptive Retrieved Images Selection for Efficient Visual Localisation

Changkun Liu, Jianhao Jiao, Huajian Huang, Zhengyang Ma, Dimitrios Kanoulas, Tristan Braud

TL;DR

AIR-HLoc introduces adaptive retrieved image selection for hierarchical visual localisation by leveraging query-database global descriptor similarity to adjust the number of retrieved references per query. It demonstrates a strong link between global similarity and local feature match proportion, and defines a per-query score $S(I^q)$ to drive retrieval and a mean localisation improvement per retrieved image (MLIP) metric to quantify contributions. Across Cambridge Landmarks, 7Scenes, and Aachen Day-Night-v1.1, AIR-HLoc achieves state-of-the-art pose accuracy while reducing feature matching cost by up to 30%, with substantial latency gains on edge hardware. This work provides practical insights for per-query $k$ selection and opens avenues for further latency-sensitive localisation, supported by formulas such as $S(I^q) = \frac{1}{3} \sum_{j \in J} \cos(g^{q}, g^{j})$ with $n=3$ and the MLIP definitions $\zeta_T(k)$ and $\zeta_R(k)$.

Abstract

State-of-the-art hierarchical localisation pipelines (HLoc) employ image retrieval (IR) to establish 2D-3D correspondences by selecting the top-$k$ most similar images from a reference database. While increasing $k$ improves localisation robustness, it also linearly increases computational cost and runtime, creating a significant bottleneck. This paper investigates the relationship between global and local descriptors, showing that greater similarity between the global descriptors of query and database images increases the proportion of feature matches. Low similarity queries significantly benefit from increasing $k$, while high similarity queries rapidly experience diminishing returns. Building on these observations, we propose an adaptive strategy that adjusts $k$ based on the similarity between the query's global descriptor and those in the database, effectively mitigating the feature-matching bottleneck. Our approach optimizes processing time without sacrificing accuracy. Experiments on three indoor and outdoor datasets show that AIR-HLoc reduces feature matching time by up to 30\%, while preserving state-of-the-art accuracy. The results demonstrate that AIR-HLoc facilitates a latency-sensitive localisation system.

AIR-HLoc: Adaptive Retrieved Images Selection for Efficient Visual Localisation

TL;DR

AIR-HLoc introduces adaptive retrieved image selection for hierarchical visual localisation by leveraging query-database global descriptor similarity to adjust the number of retrieved references per query. It demonstrates a strong link between global similarity and local feature match proportion, and defines a per-query score to drive retrieval and a mean localisation improvement per retrieved image (MLIP) metric to quantify contributions. Across Cambridge Landmarks, 7Scenes, and Aachen Day-Night-v1.1, AIR-HLoc achieves state-of-the-art pose accuracy while reducing feature matching cost by up to 30%, with substantial latency gains on edge hardware. This work provides practical insights for per-query selection and opens avenues for further latency-sensitive localisation, supported by formulas such as with and the MLIP definitions and .

Abstract

State-of-the-art hierarchical localisation pipelines (HLoc) employ image retrieval (IR) to establish 2D-3D correspondences by selecting the top- most similar images from a reference database. While increasing improves localisation robustness, it also linearly increases computational cost and runtime, creating a significant bottleneck. This paper investigates the relationship between global and local descriptors, showing that greater similarity between the global descriptors of query and database images increases the proportion of feature matches. Low similarity queries significantly benefit from increasing , while high similarity queries rapidly experience diminishing returns. Building on these observations, we propose an adaptive strategy that adjusts based on the similarity between the query's global descriptor and those in the database, effectively mitigating the feature-matching bottleneck. Our approach optimizes processing time without sacrificing accuracy. Experiments on three indoor and outdoor datasets show that AIR-HLoc reduces feature matching time by up to 30\%, while preserving state-of-the-art accuracy. The results demonstrate that AIR-HLoc facilitates a latency-sensitive localisation system.
Paper Structure (20 sections, 5 equations, 7 figures, 2 tables)

This paper contains 20 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: AIR-HLoc proposes an efficient yet practical solution to enhance localisation efficiency while maintaining accuracy by adaptively retrieving a varying number of images for different queries.
  • Figure 2: Subfigures (a)-(f) show the correlation between cosine similarity (-1 to 1) and match ratio (0 to 1) in Aachen Day-Night-v1.1 datasets sattler2018benchmarkingzhang2021reference, Cambridge landmark kendall2015posenet, and 7Scenes glocker2013realshotton2013scene using three IR models (AP-GeM, NetVLAD and EigenPlaces). The value in parentheses of each subfigure is the average PCC and SRC across all scenes in a dataset.
  • Figure 3: Average mean and median pose error (ATE, ARE) for Cambridge and 7Scenes datasets across all scenes against $k$. For HLoc, it retrieves top-$k$ similar images for all queries. For AIR-HLoc, it retrieves $k$ similar images only for hard queries and $k^*$ images for medium and easy queries. AIR-HLoc (NV) uses NetVLAD as the image retrieval module, while AIR-HLoc (EP) utilizes EigenPlaces for image retrieval. The average retrieved ratio ($k^*/k$) for all queries is shown in Figure \ref{['fig:ir_ratio_runtime']} (a).
  • Figure 4: Percentage (%) of test frames high (0.25m, $2^{\circ}$), medium (0.5m, $5^{\circ}$), and low (5m, $10^{\circ}$) accuracy sattler2018benchmarking (higher is better) for HLoc and AIR-HLoc against $k$. The average retrieved ratio ($k^*/k$) is shown in Figure \ref{['fig:ir_ratio_runtime']} (a).
  • Figure 5: (a) Retrieved Ratio refers to the ratio ($0<k^*/k\leq1$) of the average number of retrieved images for AIR-HLoc compared to HLoc for all test frames. (b) The feature matching time (runtime3) of HLoc in SuperPoint (SP) + SuperGlue (SG) setting for three datasets against $k$.
  • ...and 2 more figures