Table of Contents
Fetching ...

A Concept for Reconstructing Stucco Statues from historic Sketches using synthetic Data only

Thomas Pöllabauer, Julius Kühn

TL;DR

The paper tackles reconstructing destroyed medieval statues from sinopia sketches by training entirely on synthetic data to enable on-site, real-time 3D reconstruction. It introduces a 3-stage pipeline: (1) synthetic data generation from a diverse statue corpus, (2) image-to-image translation to bridge the gap between synthetic renders and historic sketches, and (3) an encoder–decoder network that outputs RGB, depth, normals, and a silhouette mask. A gradient-reversal regularized encoder–decoder with a residual backbone is trained on 39,600 samples to generalize to unseen statues. Preliminary results on both synthetic test objects and real sinopia indicate feasible shape recovery, though fine details such as clothing remain challenging, pointing to domain-gap limitations and avenues for future domain-knowledge integration and full 3D extension.

Abstract

In medieval times, stuccoworkers used a red color, called sinopia, to first create a sketch of the to-be-made statue on the wall. Today, many of these statues are destroyed, but using the original drawings, deriving from the red color also called sinopia, we can reconstruct how the final statue might have looked.We propose a fully-automated approach to reconstruct a point cloud and show preliminary results by generating a color-image, a depth-map, as well as surface normals requiring only a single sketch, and without requiring a collection of other, similar samples. Our proposed solution allows real-time reconstruction on-site, for instance, within an exhibition, or to generate a useful starting point for an expert, trying to manually reconstruct the statue, all while using only synthetic data for training.

A Concept for Reconstructing Stucco Statues from historic Sketches using synthetic Data only

TL;DR

The paper tackles reconstructing destroyed medieval statues from sinopia sketches by training entirely on synthetic data to enable on-site, real-time 3D reconstruction. It introduces a 3-stage pipeline: (1) synthetic data generation from a diverse statue corpus, (2) image-to-image translation to bridge the gap between synthetic renders and historic sketches, and (3) an encoder–decoder network that outputs RGB, depth, normals, and a silhouette mask. A gradient-reversal regularized encoder–decoder with a residual backbone is trained on 39,600 samples to generalize to unseen statues. Preliminary results on both synthetic test objects and real sinopia indicate feasible shape recovery, though fine details such as clothing remain challenging, pointing to domain-gap limitations and avenues for future domain-knowledge integration and full 3D extension.

Abstract

In medieval times, stuccoworkers used a red color, called sinopia, to first create a sketch of the to-be-made statue on the wall. Today, many of these statues are destroyed, but using the original drawings, deriving from the red color also called sinopia, we can reconstruct how the final statue might have looked.We propose a fully-automated approach to reconstruct a point cloud and show preliminary results by generating a color-image, a depth-map, as well as surface normals requiring only a single sketch, and without requiring a collection of other, similar samples. Our proposed solution allows real-time reconstruction on-site, for instance, within an exhibition, or to generate a useful starting point for an expert, trying to manually reconstruct the statue, all while using only synthetic data for training.
Paper Structure (8 sections, 2 figures, 1 table)

This paper contains 8 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of our approach. We solve the problem in 3 stages: First, we collect unrelated statues depicting humans. We use this data to generate images, which we reduce to mere line drawings, making it less distinguishable from our real samples. Second, we train a highly expressive deep neural network to reconstruct high quality information from the reduced representation. Third, we apply our image translation to our handful of real-world samples before estimating the 3d shape. To make the pipeline end-to-end learn-able, the mapping function between 2d-3d information should ideally be differentiable.
  • Figure 2: Preliminary qualitative results on our sinopia. Row 1 shows all outputs of our process, from left to right: First, restoration by an expert (Prof. Dr. Stiegemann), used as input to our network, followed by the mask, normals, depth and color estimate of our approach. Second row shows input-output pairs of additional sketches and their color reconstruction.