Table of Contents
Fetching ...

Sketch2CAD: 3D CAD Model Reconstruction from 2D Sketch using Visual Transformer

Hong-Bin Yang

TL;DR

This work tackles the challenge of reconstructing editable CAD-ready 3D models from a single 2D sketch. It introduces a visual transformer that outputs a scene descriptor containing object types and 6DoF parameters, which is then used by Rhino Grasshopper to assemble B-rep CAD models. The approach is evaluated on two synthetic datasets (simple and complex), showing strong performance for simple scenes but notable difficulties with complex scenes due to occlusion and data diversity. The integration with CAD software aims to bridge sketch-based ideation and conventional design workflows, though the method currently limits to known geometric shapes and simple geometries, highlighting areas for future improvement.

Abstract

Current 3D reconstruction methods typically generate outputs in the form of voxels, point clouds, or meshes. However, each of these formats has inherent limitations, such as rough surfaces and distorted structures. Additionally, these data types are not ideal for further manual editing and post-processing. In this paper, we present a novel 3D reconstruction method designed to overcome these disadvantages by reconstructing CAD-compatible models. We trained a visual transformer to predict a "scene descriptor" from a single 2D wire-frame image. This descriptor includes essential information, such as object types and parameters like position, rotation, and size. Using the predicted parameters, a 3D scene can be reconstructed with 3D modeling software that has programmable interfaces, such as Rhino Grasshopper, to build highly editable 3D models in the form of B-rep. To evaluate our proposed model, we created two datasets: one consisting of simple scenes and another with more complex scenes. The test results indicate the model's capability to accurately reconstruct simple scenes while highlighting its difficulties with more complex ones.

Sketch2CAD: 3D CAD Model Reconstruction from 2D Sketch using Visual Transformer

TL;DR

This work tackles the challenge of reconstructing editable CAD-ready 3D models from a single 2D sketch. It introduces a visual transformer that outputs a scene descriptor containing object types and 6DoF parameters, which is then used by Rhino Grasshopper to assemble B-rep CAD models. The approach is evaluated on two synthetic datasets (simple and complex), showing strong performance for simple scenes but notable difficulties with complex scenes due to occlusion and data diversity. The integration with CAD software aims to bridge sketch-based ideation and conventional design workflows, though the method currently limits to known geometric shapes and simple geometries, highlighting areas for future improvement.

Abstract

Current 3D reconstruction methods typically generate outputs in the form of voxels, point clouds, or meshes. However, each of these formats has inherent limitations, such as rough surfaces and distorted structures. Additionally, these data types are not ideal for further manual editing and post-processing. In this paper, we present a novel 3D reconstruction method designed to overcome these disadvantages by reconstructing CAD-compatible models. We trained a visual transformer to predict a "scene descriptor" from a single 2D wire-frame image. This descriptor includes essential information, such as object types and parameters like position, rotation, and size. Using the predicted parameters, a 3D scene can be reconstructed with 3D modeling software that has programmable interfaces, such as Rhino Grasshopper, to build highly editable 3D models in the form of B-rep. To evaluate our proposed model, we created two datasets: one consisting of simple scenes and another with more complex scenes. The test results indicate the model's capability to accurately reconstruct simple scenes while highlighting its difficulties with more complex ones.
Paper Structure (14 sections, 1 equation, 8 figures, 2 tables)

This paper contains 14 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An example of the uneven surface created by the 3D reconstruction method that is based on template-mesh deformation (sketch2modelZhang_2021_CVPR).
  • Figure 2: The first row of the example results showcases the input, which is a single 2D wire-frame image. The second and third rows depict the rendered wire-frames from the predicted and ground truth 3D models in the same camera orientation, with the 3D models presented in B-rep form.
  • Figure 3: The pipeline of the proposed single image 3D model reconstruction.
  • Figure 4: All shapes that appear in the dataset. From left to right are: Pyramid, Hip, Cube, A-frame, Shed, Cylinder, and Mansard.
  • Figure 5: 3 examples of the 3D scene from both of the dataset. The first row is the simple dataset, and the second is the complex dataset.
  • ...and 3 more figures