Table of Contents
Fetching ...

OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design

Zhe Yuan, Jianqi Shi, Yanhong Huang

TL;DR

OpenECAD introduces a family of compact visual-language models that generate editable CAD operation sequences from images. By designing a dedicated image–code dataset and a CAD-operational code format, combined with LoRA-fined tuning of lightweight vision-language backbones, OpenECAD achieves executable, partly accurate CAD code for simple designs and demonstrates the potential to integrate with standard CAD tools. The work provides a practical pipeline for image-to-CAD generation, including dataset design, natural-language annotations, multi-view rendering, and rigorous CAD-specific evaluation, with 2.4B yielding the best overall results among tested models. This approach holds promise for interactive CAD assistance and rapid design iteration while highlighting avenues for improvement in geometry understanding and complex sketch handling.

Abstract

Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manufacturing process, thereby enhancing production efficiency. Deep generative models of 3D shapes and 3D object reconstruction models have garnered significant research interest. However, most of these models produce discrete forms of 3D objects that are not editable. Moreover, the few models based on CAD operations often have substantial input restrictions. In this work, we fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B), leveraging the visual, logical, coding, and general capabilities of visual language models. OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands, ensuring that the designs are editable. These outputs can be directly used with existing CAD tools' APIs to generate project files. To train our network, we created a series of OpenECAD datasets. These datasets are derived from existing public CAD datasets, adjusted and augmented to meet the specific requirements of vision language model (VLM) training. Additionally, we have introduced an approach that utilizes dependency relationships to define and generate sketches, further enriching the content and functionality of the datasets.

OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design

TL;DR

OpenECAD introduces a family of compact visual-language models that generate editable CAD operation sequences from images. By designing a dedicated image–code dataset and a CAD-operational code format, combined with LoRA-fined tuning of lightweight vision-language backbones, OpenECAD achieves executable, partly accurate CAD code for simple designs and demonstrates the potential to integrate with standard CAD tools. The work provides a practical pipeline for image-to-CAD generation, including dataset design, natural-language annotations, multi-view rendering, and rigorous CAD-specific evaluation, with 2.4B yielding the best overall results among tested models. This approach holds promise for interactive CAD assistance and rapid design iteration while highlighting avenues for improvement in geometry understanding and complex sketch handling.

Abstract

Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manufacturing process, thereby enhancing production efficiency. Deep generative models of 3D shapes and 3D object reconstruction models have garnered significant research interest. However, most of these models produce discrete forms of 3D objects that are not editable. Moreover, the few models based on CAD operations often have substantial input restrictions. In this work, we fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B), leveraging the visual, logical, coding, and general capabilities of visual language models. OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands, ensuring that the designs are editable. These outputs can be directly used with existing CAD tools' APIs to generate project files. To train our network, we created a series of OpenECAD datasets. These datasets are derived from existing public CAD datasets, adjusted and augmented to meet the specific requirements of vision language model (VLM) training. Additionally, we have introduced an approach that utilizes dependency relationships to define and generate sketches, further enriching the content and functionality of the datasets.
Paper Structure (31 sections, 1 equation, 9 figures, 6 tables, 3 algorithms)

This paper contains 31 sections, 1 equation, 9 figures, 6 tables, 3 algorithms.

Figures (9)

  • Figure 1: Example of Extrusion Feature Addition Using an Existing Face as Sketch Reference Plane.
  • Figure 2: Overview of the OpenECAD Dataset and Model.
  • Figure 3: Comparison Diagram of Two Definitions for Line, Arc, and Circle.
  • Figure 4: The statistical distribution of the number of "Sketch-Extrusion" steps in OpenECAD datasets.
  • Figure 5: The partial loss curves for the OpenECAD 0.55B, 0.89B, 2.4B, and 3.1B models during training.
  • ...and 4 more figures