Table of Contents
Fetching ...

Scalable and Efficient Hierarchical Visual Topological Mapping

Saravanabalagi Ramachandran, Jonathan Horgan, Ganesh Sistu, John McDonald

TL;DR

This work addresses scalable place recognition under hierarchical topological mapping by evaluating how different global descriptors affect performance. It extends HTMap to compare hand-crafted and learned descriptors (PHOG, NetVLAD, LoST, DIPVAE) and introduces a framework for quantifying descriptor continuity and distinctiveness. The main finding is that unsupervised DIPVAE-based global descriptors deliver substantially lower runtimes with recall comparable to other descriptors, especially on long trajectories like St Lucia, thanks to favorable continuity and distinctiveness. The study demonstrates practical impact for real-time, large-scale mapping, offering a methodology to select and tune descriptors for efficient hierarchical localization.

Abstract

Hierarchical topological representations can significantly reduce search times within mapping and localization algorithms. Although recent research has shown the potential for such approaches, limited consideration has been given to the suitability and comparative performance of different global feature representations within this context. In this work, we evaluate state-of-the-art hand-crafted and learned global descriptors using a hierarchical topological mapping technique on benchmark datasets and present results of a comprehensive evaluation of the impact of the global descriptor used. Although learned descriptors have been incorporated into place recognition methods to improve retrieval accuracy and enhance overall recall, the problem of scalability and efficiency when applied to longer trajectories has not been adequately addressed in a majority of research studies. Based on our empirical analysis of multiple runs, we identify that continuity and distinctiveness are crucial characteristics for an optimal global descriptor that enable efficient and scalable hierarchical mapping, and present a methodology for quantifying and contrasting these characteristics across different global descriptors. Our study demonstrates that the use of global descriptors based on an unsupervised learned Variational Autoencoder (VAE) excels in these characteristics and achieves significantly lower runtime. It runs on a consumer grade desktop, up to 2.3x faster than the second best global descriptor, NetVLAD, and up to 9.5x faster than the hand-crafted descriptor, PHOG, on the longest track evaluated (St Lucia, 17.6 km), without sacrificing overall recall performance.

Scalable and Efficient Hierarchical Visual Topological Mapping

TL;DR

This work addresses scalable place recognition under hierarchical topological mapping by evaluating how different global descriptors affect performance. It extends HTMap to compare hand-crafted and learned descriptors (PHOG, NetVLAD, LoST, DIPVAE) and introduces a framework for quantifying descriptor continuity and distinctiveness. The main finding is that unsupervised DIPVAE-based global descriptors deliver substantially lower runtimes with recall comparable to other descriptors, especially on long trajectories like St Lucia, thanks to favorable continuity and distinctiveness. The study demonstrates practical impact for real-time, large-scale mapping, offering a methodology to select and tune descriptors for efficient hierarchical localization.

Abstract

Hierarchical topological representations can significantly reduce search times within mapping and localization algorithms. Although recent research has shown the potential for such approaches, limited consideration has been given to the suitability and comparative performance of different global feature representations within this context. In this work, we evaluate state-of-the-art hand-crafted and learned global descriptors using a hierarchical topological mapping technique on benchmark datasets and present results of a comprehensive evaluation of the impact of the global descriptor used. Although learned descriptors have been incorporated into place recognition methods to improve retrieval accuracy and enhance overall recall, the problem of scalability and efficiency when applied to longer trajectories has not been adequately addressed in a majority of research studies. Based on our empirical analysis of multiple runs, we identify that continuity and distinctiveness are crucial characteristics for an optimal global descriptor that enable efficient and scalable hierarchical mapping, and present a methodology for quantifying and contrasting these characteristics across different global descriptors. Our study demonstrates that the use of global descriptors based on an unsupervised learned Variational Autoencoder (VAE) excels in these characteristics and achieves significantly lower runtime. It runs on a consumer grade desktop, up to 2.3x faster than the second best global descriptor, NetVLAD, and up to 9.5x faster than the hand-crafted descriptor, PHOG, on the longest track evaluated (St Lucia, 17.6 km), without sacrificing overall recall performance.
Paper Structure (6 sections, 8 figures, 4 tables)

This paper contains 6 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An illustration of Hierarchical Mapping and Localization in the global descriptor embedding space. Locations are shown in green bubbles and their images are shown in orange bubbles connected to them. As $I_{25}$ gets processed, the hatched green circle around $L_5$ shows the reduced search space containing the 3 locations, only within which the loop closing image candidate is searched.
  • Figure 2: Plots showing beliefs after repeated posterior calculation (involving energy diffusion and normalization) initialized on a set of beliefs for 100 images. Left: Original HTMap htmap_Garcia-Fidalgo2017: beliefs are not diffused even after 200 iterations, Right: Ours: beliefs are diffused significantly at 20 iterations and completely at 200 iterations.
  • Figure 3: Fixes to discrete Bayes filter ablated: The plot shows the beliefs after processing 30 poses in a trial started with an initial belief of 1 for the first image and where subsequent poses do not update priors. See \ref{['Methodology']} for explanations for fixes (i) to (iii).
  • Figure 4: Results of evaluation of 6 global descriptors across 5 benchmark datasets. Each dot represents one run with the line showing a series of runs of the corresponding global descriptor (legend is given only in the first graph to avoid clutter) on the respective dataset (shown on the left margin). Column 1 highlights that DIPVAE (both R64 and R128) variants maintain the same performance, achieving recall values similar to that of other descriptors, whilst being significantly more compact and faster to compute. Vector plots presented here are best viewed zoomed on a high resolution screen.
  • Figure 5: Screenshots of a section of a City Centre trajectory visualized in OdoViz odoviz_9564712 showing more fragmentation (different colours along the trajectory) for a higher value of $t_\text{nn}$ 1.50 (top) compared to $t_\text{nn}$ 1.43 (bottom). The car moves from left to right, time is represented in z-axis with newer traversal presented higher up, poses that belong to the same location are in the same colour, and image loop closure proposals are shown in red. The missed loop closure proposal due to the creation of a new location (orange) in the bottom image is circled.
  • ...and 3 more figures