Fast 3D point clouds retrieval for Large-scale 3D Place Recognition
Chahine-Nicolas Zede, Laurent Carrafa, Valérie Gouet-Brunet
TL;DR
This work tackles scalable retrieval for 3D point clouds in large-scale LiDAR-based place recognition by adapting the Differentiable Search Index (DSI) to 3D data. It maps 3D point-cloud descriptors to 1D docids using a Vision Transformer-based captioning step, augmented with positional and semantic encoding, achieving near $O(1)$ retrieval. The proposed DSI-3D framework introduces new docid representations, notably Positional Structured identifiers and Hilbert-curve indexing, and demonstrates retrieval performance competitive with state-of-the-art methods while greatly reducing query time on KITTI datasets. The results indicate that Hilbert-based indexing offers the best trade-off between retrieval quality and speed, highlighting the method’s potential for real-time, large-scale 3D place recognition.
Abstract
Retrieval in 3D point clouds is a challenging task that consists in retrieving the most similar point clouds to a given query within a reference of 3D points. Current methods focus on comparing descriptors of point clouds in order to identify similar ones. Due to the complexity of this latter step, here we focus on the acceleration of the retrieval by adapting the Differentiable Search Index (DSI), a transformer-based approach initially designed for text information retrieval, for 3D point clouds retrieval. Our approach generates 1D identifiers based on the point descriptors, enabling direct retrieval in constant time. To adapt DSI to 3D data, we integrate Vision Transformers to map descriptors to these identifiers while incorporating positional and semantic encoding. The approach is evaluated for place recognition on a public benchmark comparing its retrieval capabilities against state-of-the-art methods, in terms of quality and speed of returned point clouds.
