Table of Contents
Fetching ...

IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

Fengtian Lang, Ruiye Ming, Zikang Yuan, Xin Yang

TL;DR

The paper tackles robust, real-time loop detection for driving scenes using LiDAR data. It introduces IFTD, a BEV-based Image Feature Triangle Descriptor built from Shi-Tomasi points on BEV projections, enabling rotation- and translation-invariant matching and a $4$-DOF pose estimation between keyframes. A two-stage verification combines a rapid hash-voxel candidate search over triangle sides with a BEV image-similarity check and SVD-based pose estimation under RANSAC to confirm loops. Experiments on KITTI, Mulran, and NCLT show IFTD outperforms state-of-the-art methods (STD and Contour Context) in accuracy and robustness while achieving about a $50 imes$ faster runtime than STD, highlighting its suitability for real-time autonomous driving applications. This approach demonstrates that BEV-derived triangle descriptors can provide strong geometric cues with low overhead for scalable loop closure in complex environments, and the authors release the code to foster community development.

Abstract

In this work, we propose a fast and robust Image Feature Triangle Descriptor (IFTD) based on the STD method, aimed at improving the efficiency and accuracy of place recognition in driving scenarios. We extract keypoints from BEV projection image of point cloud and construct these keypoints into triangle descriptors. By matching these feature triangles, we achieved precise place recognition and calculated the 4-DOF pose estimation between two keyframes. Furthermore, we employ image similarity inspection to perform the final place recognition. Experimental results on three public datasets demonstrate that our IFTD can achieve greater robustness and accuracy than state-of-the-art methods with low computational overhead.

IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

TL;DR

The paper tackles robust, real-time loop detection for driving scenes using LiDAR data. It introduces IFTD, a BEV-based Image Feature Triangle Descriptor built from Shi-Tomasi points on BEV projections, enabling rotation- and translation-invariant matching and a -DOF pose estimation between keyframes. A two-stage verification combines a rapid hash-voxel candidate search over triangle sides with a BEV image-similarity check and SVD-based pose estimation under RANSAC to confirm loops. Experiments on KITTI, Mulran, and NCLT show IFTD outperforms state-of-the-art methods (STD and Contour Context) in accuracy and robustness while achieving about a faster runtime than STD, highlighting its suitability for real-time autonomous driving applications. This approach demonstrates that BEV-derived triangle descriptors can provide strong geometric cues with low overhead for scalable loop closure in complex environments, and the authors release the code to foster community development.

Abstract

In this work, we propose a fast and robust Image Feature Triangle Descriptor (IFTD) based on the STD method, aimed at improving the efficiency and accuracy of place recognition in driving scenarios. We extract keypoints from BEV projection image of point cloud and construct these keypoints into triangle descriptors. By matching these feature triangles, we achieved precise place recognition and calculated the 4-DOF pose estimation between two keyframes. Furthermore, we employ image similarity inspection to perform the final place recognition. Experimental results on three public datasets demonstrate that our IFTD can achieve greater robustness and accuracy than state-of-the-art methods with low computational overhead.
Paper Structure (14 sections, 5 figures, 3 tables, 2 algorithms)

This paper contains 14 sections, 5 figures, 3 tables, 2 algorithms.

Figures (5)

  • Figure 1: Framework of our loop detection method. The yellow part is the core contribution we proposed.
  • Figure 2: Illustration of Encoding the height information of 3D point clouds to obtain a Bird's Eye View (BEV) projection image.
  • Figure 3: The impact of translation on maximum height. From $t_1$ to $t_2$, the vehicle occurred large lateral movement. At time $t_1$, the highest point cloud data collected is labeled as points $A$ and $B$, while at time $t_2$, the highest point cloud is labeled as points $A'$ and $B'$. By comparing these points, we can clearly observe the effect of lateral translation of the vehicle on the maximum height in the point cloud measurements.
  • Figure 4: Illustration of Image Feature Triangle Descriptor. Fig.a represents the original 3D point cloud. Fig.b represents the BEV projection image of the point cloud. In Fig.c, the red dots represent the feature points extracted from the BEV image using the Shi-Tomasi method. Fig.d shows the feature triangles constructed with the green point at the center and its neighboring points.
  • Figure 5: Precision-Recall curves of IFTD, STD, and Contour Context on KITTI, Mulran, and NCLT datasets.The red curve represents IFTD, the blue curve represents STD, and the green curve represents Contour Context.