Table of Contents
Fetching ...

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Yandong Guan, Xilin Wang, Ximing Xing, Jing Zhang, Dong Xu, Qian Yu

TL;DR

This paper tackles text-to-CAD by reframing the task as generating CadQuery Python scripts, enabling executable validation and richer modeling vocabularies. A two-stage training pipeline—supervised fine-tuning followed by reinforcement learning with Group Reward Policy Optimization—uses CAD-specific rewards and chain-of-thought planning to improve geometric fidelity and code correctness. A large-scale dataset of 110K text–CadQuery–3D triplets plus 1.5K CoT samples supports the training and evaluation. Experiments show CAD-Coder delivers significantly better geometric accuracy and executability than prior methods, advancing geometric reasoning in text-to-CAD generation and offering a foundation for CAD editing via natural language.

Abstract

In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1) supervised fine-tuning on paired text-CadQuery data, and (2) reinforcement learning with Group Reward Policy Optimization (GRPO), guided by a CAD-specific reward comprising both a geometric reward (Chamfer Distance) and a format reward. We also introduce a chain-of-thought (CoT) planning process to improve model reasoning, and construct a large-scale, high-quality dataset of 110K text-CadQuery-3D model triplets and 1.5K CoT samples via an automated pipeline. Extensive experiments demonstrate that CAD-Coder enables LLMs to generate diverse, valid, and complex CAD models directly from natural language, advancing the state of the art of text-to-CAD generation and geometric reasoning.

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

TL;DR

This paper tackles text-to-CAD by reframing the task as generating CadQuery Python scripts, enabling executable validation and richer modeling vocabularies. A two-stage training pipeline—supervised fine-tuning followed by reinforcement learning with Group Reward Policy Optimization—uses CAD-specific rewards and chain-of-thought planning to improve geometric fidelity and code correctness. A large-scale dataset of 110K text–CadQuery–3D triplets plus 1.5K CoT samples supports the training and evaluation. Experiments show CAD-Coder delivers significantly better geometric accuracy and executability than prior methods, advancing geometric reasoning in text-to-CAD generation and offering a foundation for CAD editing via natural language.

Abstract

In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1) supervised fine-tuning on paired text-CadQuery data, and (2) reinforcement learning with Group Reward Policy Optimization (GRPO), guided by a CAD-specific reward comprising both a geometric reward (Chamfer Distance) and a format reward. We also introduce a chain-of-thought (CoT) planning process to improve model reasoning, and construct a large-scale, high-quality dataset of 110K text-CadQuery-3D model triplets and 1.5K CoT samples via an automated pipeline. Extensive experiments demonstrate that CAD-Coder enables LLMs to generate diverse, valid, and complex CAD models directly from natural language, advancing the state of the art of text-to-CAD generation and geometric reasoning.

Paper Structure

This paper contains 18 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: (a) Text description of a CAD model. (b) Corresponding sketch-extrusion command sequence used in DeepCAD. (c) Corresponding CadQuery code used in our method. The bottom row shows the resulting 3D models generated by each of the three sequential operations.
  • Figure 2: CAD-Coder Training Pipeline
  • Figure 3: Qualitative comparison between baseline methods and different model variants under various training strategies.Text2CAD is a command-sequence-based baseline; Deepseek-V3 and Claude-3.7 represent open-source and proprietary LLMs, respectively. The right columns show our ablations and ull model, which best preserve structure and geometry.
  • Figure 4: Overview of our annotation pipeline. Given CAD command sequences and natural language descriptions from the Text2CAD dataset, we use DeepSeek-V3 to synthesize multiple CadQuery code candidates. These candidates are executed and compared to the ground-truth 3D models using the Chamfer Distance (CD). Scripts that execute successfully and achieve the lowest CD are retained. Finally, we construct a dataset comprising text-CadQuery-3D model triplets.
  • Figure 5: (a) Chamfer Distance (CD) distributions of generated CAD models trained with different strategies. (b) Visualizations of predicted CAD models across three CD intervals. Gray shapes represent ground-truth models, while brown shapes denote generated models. The first row shows results with $\mathrm{CD} > 1 \times 10^{-1}$, indicating that the generated CAD models differ substantially from the ground truth. The second row presents models with $1 \times 10^{-4} < \mathrm{CD} \le \times 10^{-1}$, and the third row displays models with $\mathrm{CD} \le 1 \times 10^{-4}$, indicating that these models are nearly identical to the ground-truth models.
  • ...and 4 more figures