Table of Contents
Fetching ...

CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation

Anna C. Doris, Md Ferdous Alam, Amin Heyrani Nobari, Faez Ahmed

TL;DR

CAD-Coder introduces an open-source vision-language approach fine-tuned to generate editable CadQuery CAD code from images, addressing the gap between visual inputs and parametric CAD programming. The method uses a two-stage training pipeline on GenCAD-Code with a CadQuery-based output, achieving $VSR=100\%$ and $IOU_{best}=0.675$ on a large test set while showing promising generalization to real-world images and unseen CAD operations. The GenCAD-Code dataset, derived from GenCAD, furnishes a substantial resource of image–CadQuery pairs that enables robust training and evaluation of image-conditioned CAD code generation. The work demonstrates the potential of domain-specific fine-tuning of vision-language models to streamline engineering design workflows, with public release and clear avenues for enhancing robustness and broader CAD-operations coverage.

Abstract

Efficient creation of accurate and editable 3D CAD models is critical in engineering design, significantly impacting cost and time-to-market in product innovation. Current manual workflows remain highly time-consuming and demand extensive user expertise. While recent developments in AI-driven CAD generation show promise, existing models are limited by incomplete representations of CAD operations, inability to generalize to real-world images, and low output accuracy. This paper introduces CAD-Coder, an open-source Vision-Language Model (VLM) explicitly fine-tuned to generate editable CAD code (CadQuery Python) directly from visual input. Leveraging a novel dataset that we created--GenCAD-Code, consisting of over 163k CAD-model image and code pairs--CAD-Coder outperforms state-of-the-art VLM baselines such as GPT-4.5 and Qwen2.5-VL-72B, achieving a 100% valid syntax rate and the highest accuracy in 3D solid similarity. Notably, our VLM demonstrates some signs of generalizability, successfully generating CAD code from real-world images and executing CAD operations unseen during fine-tuning. The performance and adaptability of CAD-Coder highlights the potential of VLMs fine-tuned on code to streamline CAD workflows for engineers and designers. CAD-Coder is publicly available at: https://github.com/anniedoris/CAD-Coder.

CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation

TL;DR

CAD-Coder introduces an open-source vision-language approach fine-tuned to generate editable CadQuery CAD code from images, addressing the gap between visual inputs and parametric CAD programming. The method uses a two-stage training pipeline on GenCAD-Code with a CadQuery-based output, achieving and on a large test set while showing promising generalization to real-world images and unseen CAD operations. The GenCAD-Code dataset, derived from GenCAD, furnishes a substantial resource of image–CadQuery pairs that enables robust training and evaluation of image-conditioned CAD code generation. The work demonstrates the potential of domain-specific fine-tuning of vision-language models to streamline engineering design workflows, with public release and clear avenues for enhancing robustness and broader CAD-operations coverage.

Abstract

Efficient creation of accurate and editable 3D CAD models is critical in engineering design, significantly impacting cost and time-to-market in product innovation. Current manual workflows remain highly time-consuming and demand extensive user expertise. While recent developments in AI-driven CAD generation show promise, existing models are limited by incomplete representations of CAD operations, inability to generalize to real-world images, and low output accuracy. This paper introduces CAD-Coder, an open-source Vision-Language Model (VLM) explicitly fine-tuned to generate editable CAD code (CadQuery Python) directly from visual input. Leveraging a novel dataset that we created--GenCAD-Code, consisting of over 163k CAD-model image and code pairs--CAD-Coder outperforms state-of-the-art VLM baselines such as GPT-4.5 and Qwen2.5-VL-72B, achieving a 100% valid syntax rate and the highest accuracy in 3D solid similarity. Notably, our VLM demonstrates some signs of generalizability, successfully generating CAD code from real-world images and executing CAD operations unseen during fine-tuning. The performance and adaptability of CAD-Coder highlights the potential of VLMs fine-tuned on code to streamline CAD workflows for engineers and designers. CAD-Coder is publicly available at: https://github.com/anniedoris/CAD-Coder.

Paper Structure

This paper contains 27 sections, 2 theorems, 37 equations, 5 figures, 3 tables.

Key Result

Lemma 1

Let $\Omega_1 \subset \mathbb{R}^3$ be a bounded solid with nonzero volume and let be its image under an affine transformation where $\mathbf{R} \in SO(3)$ (so that $\det(\mathbf{R}) = 1$), $s > 0$, and $\mathbf{t} \in \mathbb{R}^3$. Then the mapping is a bijection satisfying the following relative volume preservation property: For any integrable function $g:\Omega_2\to\mathbb{R}$,

Figures (5)

  • Figure 1: Overview of CAD-Coder. The VLM accepts an image as input and outputs CadQuery code, which can be run as a Python script to produce an editable, 3D solid CAD model. CAD-Coder has a LLaVA 1.5-type architecture and is fine-tuned on the GenCAD-Code dataset.
  • Figure 2: Distribution of token counts for the CadQuery scripts in our GenCAD-Code dataset.
  • Figure 3: Two examples comparing CAD-Coder's generated solids with baseline generated solids. The IOUbest score quantifies a solid's similarity to a ground truth solid, where an IOUbest of 1 is a perfect score. The solids are depicted in their alignments that yield the IOUbest score.
  • Figure 4: We test CAD-Coder's generalizability to real-image-conditioned CAD generation, a task not included in the fine-tuning dataset. 1st row: we 3D print several objects from GenCAD-Code's test set and photograph them at approximately isometric views. 2nd row: CAD-Coder's real-image-conditioned CAD generation. 3rd row: CAD-Coder's rendered-CAD image-conditioned CAD generation. 4th row: ground truth solids.
  • Figure 5: Examples of CAD-Coder variants attempting to add fillets to CAD solids. The figure compares the performance of CAD-Coder, CAD-Coder-Qwen2.5-14B, and CAD-Coder-Qwen2.5-14B-LowLR given identical filleting prompts. Only the LowLR variant correctly applies the fillet operations.

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Lemma 2: Optimal Rigid‐Body Alignment
  • proof