Table of Contents
Fetching ...

Scalable Cloud-Native Pipeline for Efficient 3D Model Reconstruction from Monocular Smartphone Images

Potito Aghilar, Vito Walter Anelli, Michelantonio Trizio, Tommaso Di Noia

TL;DR

The paper addresses scalable 3D reconstruction from monocular smartphone images by proposing a cloud-native, microservices-based pipeline tailored for Digital Twins in Industry 4.0. It integrates an ARCore-based pose recorder for data collection, CarveKit for alpha-masks, and NVIDIA nvdiffrec for mesh and texture reconstruction, all orchestrated in a Kubernetes environment with MinIO storage. Key contributions include a custom pose compensation algorithm using quaternions, a dataset preprocessing and conditioning flow, and a modular cloud architecture that separates data preprocessing, reconstruction, and scheduling tasks. The results demonstrate end-to-end feasibility with measurable latency components and reconstruction quality metrics, highlighting practical potential for rapid, scalable, phone-to-cloud 3D model generation and deployment in industrial training and visualization contexts.

Abstract

In recent years, 3D models have gained popularity in various fields, including entertainment, manufacturing, and simulation. However, manually creating these models can be a time-consuming and resource-intensive process, making it impractical for large-scale industrial applications. To address this issue, researchers are exploiting Artificial Intelligence and Machine Learning algorithms to automatically generate 3D models effortlessly. In this paper, we present a novel cloud-native pipeline that can automatically reconstruct 3D models from monocular 2D images captured using a smartphone camera. Our goal is to provide an efficient and easily-adoptable solution that meets the Industry 4.0 standards for creating a Digital Twin model, which could enhance personnel expertise through accelerated training. We leverage machine learning models developed by NVIDIA Research Labs alongside a custom-designed pose recorder with a unique pose compensation component based on the ARCore framework by Google. Our solution produces a reusable 3D model, with embedded materials and textures, exportable and customizable in any external 3D modelling software or 3D engine. Furthermore, the whole workflow is implemented by adopting the microservices architecture standard, enabling each component of the pipeline to operate as a standalone replaceable module.

Scalable Cloud-Native Pipeline for Efficient 3D Model Reconstruction from Monocular Smartphone Images

TL;DR

The paper addresses scalable 3D reconstruction from monocular smartphone images by proposing a cloud-native, microservices-based pipeline tailored for Digital Twins in Industry 4.0. It integrates an ARCore-based pose recorder for data collection, CarveKit for alpha-masks, and NVIDIA nvdiffrec for mesh and texture reconstruction, all orchestrated in a Kubernetes environment with MinIO storage. Key contributions include a custom pose compensation algorithm using quaternions, a dataset preprocessing and conditioning flow, and a modular cloud architecture that separates data preprocessing, reconstruction, and scheduling tasks. The results demonstrate end-to-end feasibility with measurable latency components and reconstruction quality metrics, highlighting practical potential for rapid, scalable, phone-to-cloud 3D model generation and deployment in industrial training and visualization contexts.

Abstract

In recent years, 3D models have gained popularity in various fields, including entertainment, manufacturing, and simulation. However, manually creating these models can be a time-consuming and resource-intensive process, making it impractical for large-scale industrial applications. To address this issue, researchers are exploiting Artificial Intelligence and Machine Learning algorithms to automatically generate 3D models effortlessly. In this paper, we present a novel cloud-native pipeline that can automatically reconstruct 3D models from monocular 2D images captured using a smartphone camera. Our goal is to provide an efficient and easily-adoptable solution that meets the Industry 4.0 standards for creating a Digital Twin model, which could enhance personnel expertise through accelerated training. We leverage machine learning models developed by NVIDIA Research Labs alongside a custom-designed pose recorder with a unique pose compensation component based on the ARCore framework by Google. Our solution produces a reusable 3D model, with embedded materials and textures, exportable and customizable in any external 3D modelling software or 3D engine. Furthermore, the whole workflow is implemented by adopting the microservices architecture standard, enabling each component of the pipeline to operate as a standalone replaceable module.
Paper Structure (22 sections, 11 equations, 6 figures)

This paper contains 22 sections, 11 equations, 6 figures.

Figures (6)

  • Figure 1: A graphical representation of the proposed pipeline. In (a), the sequence of operations required to achieve the expected result are described. In (b), the data flow between the intermediate stages of the pipeline are illustrated.
  • Figure 2: Comparison of our solution's extracted poses (a) with COLMAP's (b). COLMAP lacks of real-world reference during the pose extraction phase resulting in a non-overlapped set of poses between (a) and (b).
  • Figure 3: In (a), a partial view of the compensation matrix generated at run-time is illustrated. In (b) and (c), the difference during reconstruction with the implementation of the compensation matrix is presented: in both cases the reference image is placed side by side to highlight the differences.
  • Figure 4: Pipeline architecture with the Workloads scheduler, Preprocessor and Reconstruction microservices. The pipeline workflow is partitioned between local and cloud execution. All the stages communicate with the S3 storage layer to cache intermediate outputs and final 3D reconstructed model.
  • Figure 5: Android application during camera selection (a), data acquisition (b) and reconstruction (c) phases. The whole pipeline workflow is transparent to the end user who is notified about the status through proper feedback on the User Interface (UI). In (d) and (e) two reconstruction attempts are depicted with their respective reference images.
  • ...and 1 more figures