Table of Contents
Fetching ...

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, Fayao Liu

TL;DR

CADCrafter presents a latent diffusion framework that converts unconstrained images into editable parametric CAD command sequences by conditioning on geometry features extracted from depth and normal maps. Training solely on synthetic data, it employs a geometry-conditioned diffusion model, multi-view to single-view distillation, and a direct-preference optimization (DPO) based code checker to enforce geometric validity. The approach yields strong performance on synthetic and real-world data, outperforming baselines in command/parameter accuracy, geometric fidelity, and output validity, while enabling single-view diversity and robust real-world generalization. This work enables practical CAD generation from casual imagery with potential for scalable digital twins and manufacturing workflows by bridging the synthetic-real domain gap through geometry-aware conditioning and compiler-guided fine-tuning.

Abstract

Creating CAD digital twins from the physical world is crucial for manufacturing, design, and simulation. However, current methods typically rely on costly 3D scanning with labor-intensive post-processing. To provide a user-friendly design process, we explore the problem of reverse engineering from unconstrained real-world CAD images that can be easily captured by users of all experiences. However, the scarcity of real-world CAD data poses challenges in directly training such models. To tackle these challenges, we propose CADCrafter, an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data while testing on real-world images. To bridge the significant representation disparity between images and parametric CAD models, we introduce a geometry encoder to accurately capture diverse geometric features. Moreover, the texture-invariant properties of the geometric features can also facilitate the generalization to real-world scenarios. Since compiling CAD parameter sequences into explicit CAD models is a non-differentiable process, the network training inherently lacks explicit geometric supervision. To impose geometric validity constraints, we employ direct preference optimization (DPO) to fine-tune our model with the automatic code checker feedback on CAD sequence quality. Furthermore, we collected a real-world dataset, comprised of multi-view images and corresponding CAD command sequence pairs, to evaluate our method. Experimental results demonstrate that our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

TL;DR

CADCrafter presents a latent diffusion framework that converts unconstrained images into editable parametric CAD command sequences by conditioning on geometry features extracted from depth and normal maps. Training solely on synthetic data, it employs a geometry-conditioned diffusion model, multi-view to single-view distillation, and a direct-preference optimization (DPO) based code checker to enforce geometric validity. The approach yields strong performance on synthetic and real-world data, outperforming baselines in command/parameter accuracy, geometric fidelity, and output validity, while enabling single-view diversity and robust real-world generalization. This work enables practical CAD generation from casual imagery with potential for scalable digital twins and manufacturing workflows by bridging the synthetic-real domain gap through geometry-aware conditioning and compiler-guided fine-tuning.

Abstract

Creating CAD digital twins from the physical world is crucial for manufacturing, design, and simulation. However, current methods typically rely on costly 3D scanning with labor-intensive post-processing. To provide a user-friendly design process, we explore the problem of reverse engineering from unconstrained real-world CAD images that can be easily captured by users of all experiences. However, the scarcity of real-world CAD data poses challenges in directly training such models. To tackle these challenges, we propose CADCrafter, an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data while testing on real-world images. To bridge the significant representation disparity between images and parametric CAD models, we introduce a geometry encoder to accurately capture diverse geometric features. Moreover, the texture-invariant properties of the geometric features can also facilitate the generalization to real-world scenarios. Since compiling CAD parameter sequences into explicit CAD models is a non-differentiable process, the network training inherently lacks explicit geometric supervision. To impose geometric validity constraints, we employ direct preference optimization (DPO) to fine-tune our model with the automatic code checker feedback on CAD sequence quality. Furthermore, we collected a real-world dataset, comprised of multi-view images and corresponding CAD command sequence pairs, to evaluate our method. Experimental results demonstrate that our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.

Paper Structure

This paper contains 27 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Our proposed CADCrafter can generate CAD command sequences from unconstrained multi-domain images, including (from left to right) synthetic data renderings, 3D-printed CAD models, and unseen general objects. These generated CAD commands can then be compiled into 3D CAD models. Notably, our model is trained solely on synthetic data renderings.
  • Figure 2: The training pipeline comprises three stages. In the first, a transformer autoencoder reconstructs CAD command sequences into a latent space. Second, we extract depth and normal using a pre-trained geometric extractor, the encoded features serve as conditions in the latent diffusion model; the multi-view geometric encoders and the latent diffusion model are jointly trained. Later, a single-view geometry encoder is trained by distilling knowledge from the multi-view encoder to enhance robustness. Third, we develop a geometry validity-based code checker and fine-tune the diffusion model with direct preference optimization (DPO) to improve generation quality and accuracy.
  • Figure 3: The code checker checks if the generated CAD command sequence is compilable. The first row illustrates cases that can be successfully compiled while the second row shows invalid cases where no 2D profile is enclosed by the curves. The compiler inherently performs as an automatic checker to help our DPO fine-tuning process.
  • Figure 4: We showcase our RealCAD dataset: (a) casually captured multi-view images of a 3D printed CAD model, (b) more examples of 3D printed CAD models freely captured with iPhones.
  • Figure 5: We compare the generated CAD models from single-view images with existing methods on two datasets: the upper part shows results on the DeepCAD renderings, and the lower part shows results on the real-world RealCAD dataset.
  • ...and 4 more figures