Table of Contents
Fetching ...

LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues

Hanqing Jiang, Liyang Zhou, Zhuang Zhang, Yihao Yu, Guofeng Zhang

TL;DR

An accurate and robust Structure-from-Motion (SfM) pipeline named LiVisSfM, which is an SfM-based reconstruction system that fully combines LiDAR and visual cues, and proposes an incremental voxel updating strategy for efficient voxel map updating during the process of LiDAR frame registration and LiDAR-visual BA optimization.

Abstract

This paper presents an accurate and robust Structure-from-Motion (SfM) pipeline named LiVisSfM, which is an SfM-based reconstruction system that fully combines LiDAR and visual cues. Unlike most existing LiDAR-inertial odometry (LIO) and LiDAR-inertial-visual odometry (LIVO) methods relying heavily on LiDAR registration coupled with Inertial Measurement Unit (IMU), we propose a LiDAR-visual SfM method which innovatively carries out LiDAR frame registration to LiDAR voxel map in a Point-to-Gaussian residual metrics, combined with a LiDAR-visual BA and explicit loop closure in a bundle optimization way to achieve accurate and robust LiDAR pose estimation without dependence on IMU incorporation. Besides, we propose an incremental voxel updating strategy for efficient voxel map updating during the process of LiDAR frame registration and LiDAR-visual BA optimization. Experiments demonstrate the superior effectiveness of our LiVisSfM framework over state-of-the-art LIO and LIVO works on more accurate and robust LiDAR pose recovery and dense point cloud reconstruction of both public KITTI benchmark and a variety of self-captured dataset.

LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues

TL;DR

An accurate and robust Structure-from-Motion (SfM) pipeline named LiVisSfM, which is an SfM-based reconstruction system that fully combines LiDAR and visual cues, and proposes an incremental voxel updating strategy for efficient voxel map updating during the process of LiDAR frame registration and LiDAR-visual BA optimization.

Abstract

This paper presents an accurate and robust Structure-from-Motion (SfM) pipeline named LiVisSfM, which is an SfM-based reconstruction system that fully combines LiDAR and visual cues. Unlike most existing LiDAR-inertial odometry (LIO) and LiDAR-inertial-visual odometry (LIVO) methods relying heavily on LiDAR registration coupled with Inertial Measurement Unit (IMU), we propose a LiDAR-visual SfM method which innovatively carries out LiDAR frame registration to LiDAR voxel map in a Point-to-Gaussian residual metrics, combined with a LiDAR-visual BA and explicit loop closure in a bundle optimization way to achieve accurate and robust LiDAR pose estimation without dependence on IMU incorporation. Besides, we propose an incremental voxel updating strategy for efficient voxel map updating during the process of LiDAR frame registration and LiDAR-visual BA optimization. Experiments demonstrate the superior effectiveness of our LiVisSfM framework over state-of-the-art LIO and LIVO works on more accurate and robust LiDAR pose recovery and dense point cloud reconstruction of both public KITTI benchmark and a variety of self-captured dataset.

Paper Structure

This paper contains 17 sections, 14 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: 3D point cloud reconstruction of "Outdoor Tianren Office" captured by MetaCam-Air handheld LiDAR scanner. (a) is the input LiDAR frames and fisheye camera frames. (b) is the estimated LiDAR trajectory and the fused LiDAR point cloud reconstructed by FAST-LIO2 xu2022fast. (c) is the LiDAR trajectory and LiDAR point cloud reconstructed by VoxelMap yuan2022efficient, which collapses seriously. (d) is the LiDAR trajectory and LiDAR point cloud by our LiVisSfM pipeline, with the magnified regions highlighted in red rectangles to observe the local reconstruction details. It can be seen that FAST-LIO2 results in misalignment on the reconstructed dense point cloud, and our method can significantly eliminate long-range accumulation errors to produce finer local geometric details.
  • Figure 2: System overview of our LiVisSfM, which consists of three main modules: LiDAR and visual map initialization, alternative LiDAR and visual pose registration, and mapping. The mapping module includes a global LiDAR-visual BA and an explicit loop closure carried out on a visual map with sparse feature points and a LiDAR map represented in voxel map structure. The multiple LiDAR frames are finally fused by the optimized LiDAR poses and colorized by the visual frames to a complete dense point cloud.
  • Figure 3: Illustration of plane voxels in case "Outdoor Tianren Office". (a) Extracted plane voxels fusing all the registered LiDAR frames. (b) The finally fused dense point cloud, from which we can see that only the planar structures form the plane voxels, which no more LiDAR point will be inserted into.
  • Figure 4: Alternative estimation of LiDAR and visual poses. (a-c) are the estimated LiDAR poses of GICP segal2009generalized, NDT biber2003normal and our method respectively on KITTI "07" sequence, with LiDAR frames fused together to qualitatively compare the pose accuracies of the three methods, and the APE and RPE of the estimated LiDAR poses measured in MAE/RMSE in meters compared to the KITTI's GT poses given to quantitatively evaluate the pose accuracies. It can be seen that GICP has obvious pose estimation errors compared to NDT, but NDT still shows long-range accumulation error in the fused LiDAR point cloud. Our method performs the best in both the estimated LiDAR poses and the fused point cloud, which can also be verified by the highest accuracy in APE and RPE.
  • Figure 5: Ablation study about LiDAR-visual BA, time-related weight and explicit loop closure. (a) With only visual BA, the LiDAR pose has obvious accumulation error. (b) LiDAR-visual BA significantly improves LiDAR pose accuracy, but accumulation drift still occurs. (c) After involving the time-related weight, the pose accumulation error is further reduced. (d) The combination of LiDAR-visual BA, time-related weight and explicit loop closure achieves the best reconstruction result on both the estimated LiDAR poses and the fused LiDAR point cloud as shown in the magnified regions highlighted by the blue rectangles, also with the best pose accuracy evaluated by APE and RPE in MAE/RMSE in meters.
  • ...and 4 more figures