Table of Contents
Fetching ...

3D VR Sketch Guided 3D Shape Prototyping and Exploration

Ling Luo, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song, Yulia Gryaditskaya

TL;DR

This work tackles the challenge of converting sparse, novice-level 3D VR sketches into multiple plausible 3D shapes for a given category. It introduces a two-stage framework: first, a deterministic shape auto-decoder learns to reconstruct shapes from sketch-aligned latent codes represented as truncated SDFs; second, a conditional normalizing flow (CNF) in the latent space generates diverse shape samples conditioned on the sketch embedding. The model employs a suite of losses to align sketches with shapes, including a dedicated sketch fidelity loss and contrastive latent-space alignments, enabling robust performance despite limited data and misalignment between sketches and references. Empirically, the approach achieves good sketch fidelity and meaningful diversity, outperforming retrieval baselines in scenarios with sparse or novel inputs, and demonstrates smooth interpolation in the sampling space for design exploration.

Abstract

3D shape modeling is labor-intensive, time-consuming, and requires years of expertise. To facilitate 3D shape modeling, we propose a 3D shape generation network that takes a 3D VR sketch as a condition. We assume that sketches are created by novices without art training and aim to reconstruct geometrically realistic 3D shapes of a given category. To handle potential sketch ambiguity, our method creates multiple 3D shapes that align with the original sketch's structure. We carefully design our method, training the model step-by-step and leveraging multi-modal 3D shape representation to support training with limited training data. To guarantee the realism of generated 3D shapes we leverage the normalizing flow that models the distribution of the latent space of 3D shapes. To encourage the fidelity of the generated 3D shapes to an input sketch, we propose a dedicated loss that we deploy at different stages of the training process. The code is available at https://github.com/Rowl1ng/3Dsketch2shape.

3D VR Sketch Guided 3D Shape Prototyping and Exploration

TL;DR

This work tackles the challenge of converting sparse, novice-level 3D VR sketches into multiple plausible 3D shapes for a given category. It introduces a two-stage framework: first, a deterministic shape auto-decoder learns to reconstruct shapes from sketch-aligned latent codes represented as truncated SDFs; second, a conditional normalizing flow (CNF) in the latent space generates diverse shape samples conditioned on the sketch embedding. The model employs a suite of losses to align sketches with shapes, including a dedicated sketch fidelity loss and contrastive latent-space alignments, enabling robust performance despite limited data and misalignment between sketches and references. Empirically, the approach achieves good sketch fidelity and meaningful diversity, outperforming retrieval baselines in scenarios with sparse or novel inputs, and demonstrates smooth interpolation in the sampling space for design exploration.

Abstract

3D shape modeling is labor-intensive, time-consuming, and requires years of expertise. To facilitate 3D shape modeling, we propose a 3D shape generation network that takes a 3D VR sketch as a condition. We assume that sketches are created by novices without art training and aim to reconstruct geometrically realistic 3D shapes of a given category. To handle potential sketch ambiguity, our method creates multiple 3D shapes that align with the original sketch's structure. We carefully design our method, training the model step-by-step and leveraging multi-modal 3D shape representation to support training with limited training data. To guarantee the realism of generated 3D shapes we leverage the normalizing flow that models the distribution of the latent space of 3D shapes. To encourage the fidelity of the generated 3D shapes to an input sketch, we propose a dedicated loss that we deploy at different stages of the training process. The code is available at https://github.com/Rowl1ng/3Dsketch2shape.
Paper Structure (46 sections, 15 equations, 11 figures, 5 tables)

This paper contains 46 sections, 15 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Given a VR (Virtual Reality) sketch input, we generate 3D shape samples that satisfy three requirements: (1 - fidelity) reconstructed shapes follow the overall structure of a quick VR sketch; (2 - diversity) reconstructed shapes contain some diversity in shape details: such as a hollow or solid backrest, and (3 - realism) reconstructions favor geometrically realistic 3D shapes of a given category.
  • Figure 2: Example of misalignment and ambiguity of 3D sketch. Misalignment: the collected sketches and reference shapes have deviations in terms of the position and proportion of their parts. Ambiguity: due to the sparsity and abstract nature of sketches, strokes can be interpreted differently. For example, the strokes of a cube can represent either slender bars or a closed solid shape.
  • Figure 3: Our method consists of 2 stages: (a) the first stage allows to obtain deterministic 3D shape reconstructions from input sketches, as described in \ref{['sec:decoder', 'sec:encoder']}, while (b) the second stage enables conditional 3D shape sample generation, as described in \ref{['sec:generation']}. The auto-encoder (AE) is trained in three steps: first auto-decoder is trained, then the shape encoder is trained, and finally, the encoder is fine-tuned to jointly encode sketches and shapes.
  • Figure 4: Generation results. 'Ref.' shows the reference 3D shape. 'AE' shows the deterministic prediction by our AE from the first stage of our method. 'Mean' denotes the shape reconstructed from the sample corresponding to the mean of the conditional distribution. And finally, we show 5 randomly generated shapes conditioned on the input sketch, sorted in the order of fidelity to a reference shape.
  • Figure 5: Comparison of the generated samples conditioned on the input sketch when $\mathcal{L}_{\text{sketch, CNF}}$ is used (purple) or not (green). This example shows that the sketch fidelity loss indeed results in better fidelity to a sketch input: all generated shapes when the loss is used contain handles and better respect the shape of chair legs/support. 'Mean' denotes the shape reconstructed from the sample corresponding to the mean of the conditional distribution.
  • ...and 6 more figures