Sketch2CAD: 3D CAD Model Reconstruction from 2D Sketch using Visual Transformer
Hong-Bin Yang
TL;DR
This work tackles the challenge of reconstructing editable CAD-ready 3D models from a single 2D sketch. It introduces a visual transformer that outputs a scene descriptor containing object types and 6DoF parameters, which is then used by Rhino Grasshopper to assemble B-rep CAD models. The approach is evaluated on two synthetic datasets (simple and complex), showing strong performance for simple scenes but notable difficulties with complex scenes due to occlusion and data diversity. The integration with CAD software aims to bridge sketch-based ideation and conventional design workflows, though the method currently limits to known geometric shapes and simple geometries, highlighting areas for future improvement.
Abstract
Current 3D reconstruction methods typically generate outputs in the form of voxels, point clouds, or meshes. However, each of these formats has inherent limitations, such as rough surfaces and distorted structures. Additionally, these data types are not ideal for further manual editing and post-processing. In this paper, we present a novel 3D reconstruction method designed to overcome these disadvantages by reconstructing CAD-compatible models. We trained a visual transformer to predict a "scene descriptor" from a single 2D wire-frame image. This descriptor includes essential information, such as object types and parameters like position, rotation, and size. Using the predicted parameters, a 3D scene can be reconstructed with 3D modeling software that has programmable interfaces, such as Rhino Grasshopper, to build highly editable 3D models in the form of B-rep. To evaluate our proposed model, we created two datasets: one consisting of simple scenes and another with more complex scenes. The test results indicate the model's capability to accurately reconstruct simple scenes while highlighting its difficulties with more complex ones.
