Scalable and Efficient Hierarchical Visual Topological Mapping
Saravanabalagi Ramachandran, Jonathan Horgan, Ganesh Sistu, John McDonald
TL;DR
This work addresses scalable place recognition under hierarchical topological mapping by evaluating how different global descriptors affect performance. It extends HTMap to compare hand-crafted and learned descriptors (PHOG, NetVLAD, LoST, DIPVAE) and introduces a framework for quantifying descriptor continuity and distinctiveness. The main finding is that unsupervised DIPVAE-based global descriptors deliver substantially lower runtimes with recall comparable to other descriptors, especially on long trajectories like St Lucia, thanks to favorable continuity and distinctiveness. The study demonstrates practical impact for real-time, large-scale mapping, offering a methodology to select and tune descriptors for efficient hierarchical localization.
Abstract
Hierarchical topological representations can significantly reduce search times within mapping and localization algorithms. Although recent research has shown the potential for such approaches, limited consideration has been given to the suitability and comparative performance of different global feature representations within this context. In this work, we evaluate state-of-the-art hand-crafted and learned global descriptors using a hierarchical topological mapping technique on benchmark datasets and present results of a comprehensive evaluation of the impact of the global descriptor used. Although learned descriptors have been incorporated into place recognition methods to improve retrieval accuracy and enhance overall recall, the problem of scalability and efficiency when applied to longer trajectories has not been adequately addressed in a majority of research studies. Based on our empirical analysis of multiple runs, we identify that continuity and distinctiveness are crucial characteristics for an optimal global descriptor that enable efficient and scalable hierarchical mapping, and present a methodology for quantifying and contrasting these characteristics across different global descriptors. Our study demonstrates that the use of global descriptors based on an unsupervised learned Variational Autoencoder (VAE) excels in these characteristics and achieves significantly lower runtime. It runs on a consumer grade desktop, up to 2.3x faster than the second best global descriptor, NetVLAD, and up to 9.5x faster than the hand-crafted descriptor, PHOG, on the longest track evaluated (St Lucia, 17.6 km), without sacrificing overall recall performance.
