Table of Contents
Fetching ...

Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures

Pengru Deng, Jiapeng Yao, Chun Li, Su Wang, Xinrun Li, Varun Ojha, Xuhui He, Takashi Matsumoto

TL;DR

This paper addresses the challenge of robust, generalized concrete crack inspection across diverse environments by combining a few-shot crack segmentation approach with a foundation-model–driven refinement and a LiDAR–camera–IMU multi-sensor SLAM framework. It introduces a four-module system: calibrated multi-sensor data acquisition, 2D crack segmentation refined by SAM prompts, dense 3D crack reconstruction with MLS/SOR denoising, and automatic 3D crack width and localization measurements within the colored point cloud. The key contributions include a generalizable crack segmentation workflow leveraging SAM, a dense multi-frame multi-modal 3D reconstruction pipeline, and an automated 3D crack measurement method validated on field data with submillimeter accuracy and competitive reconstruction quality. The framework promises practical impact for on-site inspection and digital twin applications by delivering accurate, automated crack metrics directly in 3D space, under varied geometries and conditions.

Abstract

Visual-Spatial Systems has become increasingly essential in concrete crack inspection. However, existing methods often lacks adaptability to diverse scenarios, exhibits limited robustness in image-based approaches, and struggles with curved or complex geometries. To address these limitations, an innovative framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement was proposed by integrating computer vision technologies and multi-modal Simultaneous localization and mapping (SLAM) in this study. Firstly, building on a base DeepLabv3+ segmentation model, and incorporating specific refinements utilizing foundation model Segment Anything Model (SAM), we developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks. To enhance the accuracy and robustness of 3D reconstruction, Light Detection and Ranging (LiDAR) point clouds were utilized together with image data and segmentation masks. By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale. Furthermore, the crack geometric attributions were measured automatically and directly within 3D dense point cloud space, surpassing the limitations of conventional 2D image-based measurements. This advancement makes the method suitable for structural components with curved and complex 3D geometries. Experimental results across various concrete structures highlight the significant improvements and unique advantages of the proposed method, demonstrating its effectiveness, accuracy, and robustness in real-world applications.

Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures

TL;DR

This paper addresses the challenge of robust, generalized concrete crack inspection across diverse environments by combining a few-shot crack segmentation approach with a foundation-model–driven refinement and a LiDAR–camera–IMU multi-sensor SLAM framework. It introduces a four-module system: calibrated multi-sensor data acquisition, 2D crack segmentation refined by SAM prompts, dense 3D crack reconstruction with MLS/SOR denoising, and automatic 3D crack width and localization measurements within the colored point cloud. The key contributions include a generalizable crack segmentation workflow leveraging SAM, a dense multi-frame multi-modal 3D reconstruction pipeline, and an automated 3D crack measurement method validated on field data with submillimeter accuracy and competitive reconstruction quality. The framework promises practical impact for on-site inspection and digital twin applications by delivering accurate, automated crack metrics directly in 3D space, under varied geometries and conditions.

Abstract

Visual-Spatial Systems has become increasingly essential in concrete crack inspection. However, existing methods often lacks adaptability to diverse scenarios, exhibits limited robustness in image-based approaches, and struggles with curved or complex geometries. To address these limitations, an innovative framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement was proposed by integrating computer vision technologies and multi-modal Simultaneous localization and mapping (SLAM) in this study. Firstly, building on a base DeepLabv3+ segmentation model, and incorporating specific refinements utilizing foundation model Segment Anything Model (SAM), we developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks. To enhance the accuracy and robustness of 3D reconstruction, Light Detection and Ranging (LiDAR) point clouds were utilized together with image data and segmentation masks. By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale. Furthermore, the crack geometric attributions were measured automatically and directly within 3D dense point cloud space, surpassing the limitations of conventional 2D image-based measurements. This advancement makes the method suitable for structural components with curved and complex 3D geometries. Experimental results across various concrete structures highlight the significant improvements and unique advantages of the proposed method, demonstrating its effectiveness, accuracy, and robustness in real-world applications.
Paper Structure (29 sections, 13 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 13 equations, 16 figures, 5 tables, 1 algorithm.

Figures (16)

  • Figure 1: The proposed framework’s workflow involves several steps. First, the camera, LiDAR, and IMU are associated through extrinsic calibration. The LiDAR generates point clouds for LIO and 3D reconstruction. Concurrently, the camera provides images for semantic segmentation. Using a fusion module and post-processing, automatic crack detection and reconstruction are achieved.
  • Figure 2: Calibration Process: First, point cloud and image data are collected. Next, the SuperGlue algorithm is used to identify matching points between the point cloud and the image. Using these correspondences, the external transformation matrix is estimated to fuse the point cloud map with the image data. This step involves calculating the relative rotation matrix $\mathbf{R}_L^C \in SO(3)$ and the translation vector $\mathbf{t}_L^C$.
  • Figure 3: Dataset composition and relationships in K-shot supervised segmentation.
  • Figure 4: Network structure of DeepLabv3+, where Conv is the convolution layer. The encoder module encodes multi-scale contextual information by applying atrous convolution at multiple scales, while the simple yet effective decoder module refines the segmentation results along object boundaries.
  • Figure 5: Framework Process for Foundation Model Refinement. The image is first processed using DeepLabv3+ to generate a segmentation mask. This mask is then used to create prompt points for the foundation model, which performs the inference. The results are evaluated for quality, and the optimized mask is selected based on this evaluation.
  • ...and 11 more figures