Table of Contents
Fetching ...

Advancing Digital Twin Generation Through a Novel Simulation Framework and Quantitative Benchmarking

Jacob Rubinstein, Avi Donaty, Don Engel

TL;DR

This work tackles the lack of quantitative benchmarking for digital twin generation via photogrammetry by introducing a synthetic-data pipeline that renders images from ground-truth 3D models with programmable camera poses. The pipeline uses Blender for synthetic frame generation and Meshroom for 3D reconstruction, followed by alignment steps using the Kabsch algorithm and ICP, with evaluation via a weighted SSIM across frames. The authors demonstrate the approach on a real-world dataset, report completion rates and quality metrics, and explore how frame count and resolution affect reconstruction fidelity. They also outline future directions, including large-scale parameter sweeps and extensions to alternative 3D representations, to establish benchmarks and improve digital twin generation methods.

Abstract

The generation of 3D models from real-world objects has often been accomplished through photogrammetry, i.e., by taking 2D photos from a variety of perspectives and then triangulating matched point-based features to create a textured mesh. Many design choices exist within this framework for the generation of digital twins, and differences between such approaches are largely judged qualitatively. Here, we present and test a novel pipeline for generating synthetic images from high-quality 3D models and programmatically generated camera poses. This enables a wide variety of repeatable, quantifiable experiments which can compare ground-truth knowledge of virtual camera parameters and of virtual objects against the reconstructed estimations of those perspectives and subjects.

Advancing Digital Twin Generation Through a Novel Simulation Framework and Quantitative Benchmarking

TL;DR

This work tackles the lack of quantitative benchmarking for digital twin generation via photogrammetry by introducing a synthetic-data pipeline that renders images from ground-truth 3D models with programmable camera poses. The pipeline uses Blender for synthetic frame generation and Meshroom for 3D reconstruction, followed by alignment steps using the Kabsch algorithm and ICP, with evaluation via a weighted SSIM across frames. The authors demonstrate the approach on a real-world dataset, report completion rates and quality metrics, and explore how frame count and resolution affect reconstruction fidelity. They also outline future directions, including large-scale parameter sweeps and extensions to alternative 3D representations, to establish benchmarks and improve digital twin generation methods.

Abstract

The generation of 3D models from real-world objects has often been accomplished through photogrammetry, i.e., by taking 2D photos from a variety of perspectives and then triangulating matched point-based features to create a textured mesh. Many design choices exist within this framework for the generation of digital twins, and differences between such approaches are largely judged qualitatively. Here, we present and test a novel pipeline for generating synthetic images from high-quality 3D models and programmatically generated camera poses. This enables a wide variety of repeatable, quantifiable experiments which can compare ground-truth knowledge of virtual camera parameters and of virtual objects against the reconstructed estimations of those perspectives and subjects.
Paper Structure (19 sections, 7 equations, 6 figures)

This paper contains 19 sections, 7 equations, 6 figures.

Figures (6)

  • Figure 1: In green: The SES found using Welzl's algorithm. $R_{\textrm{SES}}$ is the radius of the smallest enclosing sphere. $\theta$ is half of Blender's default vertical field of view. $R_{\textrm{CAM}}$ is the radius of the camera sphere (what we're trying to find).
  • Figure 2: Original models, followed by reconstructed models, using the same camera positions. Overall, the reconstructions were similar to the original models, but certain common model traits caused issues. Red arrows demonstrate differences.
  • Figure 3: Distribution of SSIM values across the entire dataset. The x-axis represents the SSIM score, the y-axis represents the number of models with that score.
  • Figure 4: An example of a low-quality reconstruction from three matching perspectives.
  • Figure 5: Histogram of results with 70, 100, and 130 frames for reconstruction. The x-axis represents the name of the model, the y-axis represents the SSIM score.
  • ...and 1 more figures