Table of Contents
Fetching ...

BIMCaP: BIM-based AI-supported LiDAR-Camera Pose Refinement

Miguel Arturo Vega Torres, Anna Ribic, Borja García de Soto, André Borrmann

TL;DR

BIMCaP tackles SLAM drift in indoor mapping with affordable sensors by tying RGB–LiDAR measurements to a pre-existing BIM. It fuses sparse LiDAR with camera data, densifies depth via linear interpolation and CompletionFormer, and aligns the resulting map to a BIM-derived reference through bundle adjustment guided by semantic landmarks. The method yields a translational accuracy improvement of about $4\,\text{cm}$ over state-of-the-art baselines on the ConSLAM dataset and enhances rotational alignment as well, enabling more accurate digital twins for construction, safety, and emergency response in GPS-denied environments. Overall, BIMCaP provides a cost-effective, BIM-informed pathway to robust indoor 3D reconstruction."

Abstract

This paper introduces BIMCaP, a novel method to integrate mobile 3D sparse LiDAR data and camera measurements with pre-existing building information models (BIMs), enhancing fast and accurate indoor mapping with affordable sensors. BIMCaP refines sensor poses by leveraging a 3D BIM and employing a bundle adjustment technique to align real-world measurements with the model. Experiments using real-world open-access data show that BIMCaP achieves superior accuracy, reducing translational error by over 4 cm compared to current state-of-the-art methods. This advancement enhances the accuracy and cost-effectiveness of 3D mapping methodologies like SLAM. BIMCaP's improvements benefit various fields, including construction site management and emergency response, by providing up-to-date, aligned digital maps for better decision-making and productivity. Link to the repository: https://github.com/MigVega/BIMCaP

BIMCaP: BIM-based AI-supported LiDAR-Camera Pose Refinement

TL;DR

BIMCaP tackles SLAM drift in indoor mapping with affordable sensors by tying RGB–LiDAR measurements to a pre-existing BIM. It fuses sparse LiDAR with camera data, densifies depth via linear interpolation and CompletionFormer, and aligns the resulting map to a BIM-derived reference through bundle adjustment guided by semantic landmarks. The method yields a translational accuracy improvement of about over state-of-the-art baselines on the ConSLAM dataset and enhances rotational alignment as well, enabling more accurate digital twins for construction, safety, and emergency response in GPS-denied environments. Overall, BIMCaP provides a cost-effective, BIM-informed pathway to robust indoor 3D reconstruction."

Abstract

This paper introduces BIMCaP, a novel method to integrate mobile 3D sparse LiDAR data and camera measurements with pre-existing building information models (BIMs), enhancing fast and accurate indoor mapping with affordable sensors. BIMCaP refines sensor poses by leveraging a 3D BIM and employing a bundle adjustment technique to align real-world measurements with the model. Experiments using real-world open-access data show that BIMCaP achieves superior accuracy, reducing translational error by over 4 cm compared to current state-of-the-art methods. This advancement enhances the accuracy and cost-effectiveness of 3D mapping methodologies like SLAM. BIMCaP's improvements benefit various fields, including construction site management and emergency response, by providing up-to-date, aligned digital maps for better decision-making and productivity. Link to the repository: https://github.com/MigVega/BIMCaP

Paper Structure

This paper contains 15 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of the proposed BIMCaP framework for sensor pose refinement. The depth maps are used to project the semantic maps (created from the images) into the 3D space using the approximated initial poses (drifted due to SLAM). Different terms aim to correlate the data measured over permanent elements (i.e., reliable landmarks) with the BIM. Moreover, a geometric term ensures geometric consistency among real-world images.
  • Figure 2: Depth completion with sparse LiDAR point cloud: (a) original image from the ConSLAM dataset with original sparse projected LiDAR scan; (b) depth map using only CompletionFormer; (c) depth map using only linear interpolation and (d) using linear interpolation and CompletionFormer. It is evident that (d) yields the best results since it is smother than (c) and more coherent with the measurements than (b).
  • Figure 3: Reference map preparation: (a) original 3D BIM (without ceiling); (b) uniformly sampled 3D point cloud with semantic information from the BIM; and (c) vectorized semantic floor plan, from which the walls and columns (in black and yellow) are used for pose refinement in the subsequent pose optimization step.
  • Figure 4: Semantic segmentation over 2D images of the ConSLAM dataset: (a) inference with the original Grounding DINO algorithm (b) inference result after replacing DINO with pre-trained RTMDet for object detection. (b) comprehends predicted labels for the walls in the background, which are critical for our camera pose refinement framework. Similarly, (c) and (d) are, respectively, the results of Grounding DINO and the results of our proposed pipeline.
  • Figure 5: Features used for optimization (a) Top view semantic segmented map generated with the ground truth poses and the segmentation results of walls (in red) and floor (in blue) as explained in Section \ref{['step2b']}; (b) map created with synthetic poses of Exp. 1 (obtained as explained in Section \ref{['step3a']}), here the COLMAP features are visible; (c) view from an indoor observer's perspective of the point cloud with highlighted Scale-Invariant Feature Transform (SIFT) features used in the geometric term for optimization.
  • ...and 2 more figures