Table of Contents
Fetching ...

A 3D Reconstruction Benchmark for Asset Inspection

James L. Gray, Nikolai Goncharov, Alexandre Cardaillac, Ryan Griffiths, Jack Naylor, Donald G. Dansereau

Abstract

Asset management requires accurate 3D models to inform the maintenance, repair, and assessment of buildings, maritime vessels, and other key structures as they age. These downstream applications rely on high-fidelity models produced from aerial surveys in close proximity to the asset, enabling operators to locate and characterise deterioration or damage and plan repairs. Captured images typically have high overlap between adjacent camera poses, sufficient detail at millimetre scale, and challenging visual appearances such as reflections and transparency. However, existing 3D reconstruction datasets lack examples of these conditions, making it difficult to benchmark methods for this task. We present a new dataset with ground truth depth maps, camera poses, and mesh models of three synthetic scenes with simulated inspection trajectories and varying levels of surface condition on non-Lambertian scene content. We evaluate state-of-the-art reconstruction methods on this dataset. Our results demonstrate that current approaches struggle significantly with the dense capture trajectories and complex surface conditions inherent to this domain, exposing a critical scalability gap and pointing toward new research directions for deployable 3D reconstruction in asset inspection. Project page: https://roboticimaging.org/Projects/asset-inspection-dataset/

A 3D Reconstruction Benchmark for Asset Inspection

Abstract

Asset management requires accurate 3D models to inform the maintenance, repair, and assessment of buildings, maritime vessels, and other key structures as they age. These downstream applications rely on high-fidelity models produced from aerial surveys in close proximity to the asset, enabling operators to locate and characterise deterioration or damage and plan repairs. Captured images typically have high overlap between adjacent camera poses, sufficient detail at millimetre scale, and challenging visual appearances such as reflections and transparency. However, existing 3D reconstruction datasets lack examples of these conditions, making it difficult to benchmark methods for this task. We present a new dataset with ground truth depth maps, camera poses, and mesh models of three synthetic scenes with simulated inspection trajectories and varying levels of surface condition on non-Lambertian scene content. We evaluate state-of-the-art reconstruction methods on this dataset. Our results demonstrate that current approaches struggle significantly with the dense capture trajectories and complex surface conditions inherent to this domain, exposing a critical scalability gap and pointing toward new research directions for deployable 3D reconstruction in asset inspection. Project page: https://roboticimaging.org/Projects/asset-inspection-dataset/
Paper Structure (44 sections, 9 equations, 15 figures, 8 tables)

This paper contains 44 sections, 9 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: We propose a new benchmark for 3D reconstruction from imagery captured in asset inspection operations. Given user-specified surface soiling, drone trajectories and asset meshes, we generate synthetic data using Blender blender that matches the close proximity and high overlap of data captured in industrial asset inspection operations. After adding emulated signal-dependent sensor noise, we benchmark the camera pose and 3D reconstruction accuracy of both end-to-end transformer architectures and traditional structure from motion and multi-view stereo pipelines with ground truth data from our simulator.
  • Figure 2: Renders of the office building, crane and bridge scenes. The office scene was rendered with four surface conditions corresponding to very low, low, medium and high levels of soiling.
  • Figure 3: Depth maps rendered on the office building (high) scene using self-estimated camera parameters. Each depth map is scaled to match the ground truth range, with near-far corresponding to blue-yellow. Learning-based approaches demonstrate superior performance even in the case of high soiling on reflective scene content, despite recovering less accurate poses. COLMAP is unable to accurately match large regions of non-Lambertian scene content, commonly seen in asset inspection tasks.
  • Figure 4: Qualitative camera pose results on the crane scene. COLMAP, GLOMAP, Depth Anything 3 and $\pi^3$ performed relatively well at estimating the camera poses. However, VGGT performed poorly.
  • Figure 5: Point clouds from each method on the bridge scene.
  • ...and 10 more figures