CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design
Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, Sarath Chandar
TL;DR
CADmium presents a text-to-text framework for CAD design by using GPT-4.1 to generate expert-level language annotations for a large CAD corpus and fine-tuning a code-language model (Qwen-2.5-Coder-14B) to produce JSON-based CAD histories from natural language prompts. The approach introduces novel geometric and topological metrics, including SD, DMCD, and EECM, to better capture 3D fidelity beyond traditional point-cloud or mesh-based scores. Experimental results show CADmium can automate CAD design with competitive or superior performance to Text2CAD across several tasks, scales, and datasets, while highlighting improved annotation fluency and robustness to text prompts. The work advances rapid, text-conditioned CAD generation and provides a valuable dataset, code, and models for the CAD and 3D design communities.
Abstract
Computer-aided design (CAD) is the digital construction of 2D and 3D objects, and is central to a wide range of engineering and manufacturing applications like automobile and aviation. Despite its importance, CAD modeling remains largely a time-intensive, manual task. Recent works have attempted to automate this process with small transformer-based models and handcrafted CAD sequence representations. However, there has been little effort to leverage the potential of large language models (LLMs) for sequential CAD design. In this work, we introduce a new large-scale dataset of more than 170k CAD models annotated with high-quality, human-like descriptions generated with our pipeline based on GPT-4.1. Using this dataset, we fine-tune powerful code-LLMs to generate CAD sequences represented in a JSON-based format from natural language descriptions, demonstrating the viability and effectiveness of this approach for text-conditioned CAD generation. Because simple metrics often fail to reflect the quality of generated objects, we introduce geometric and topological metrics based on sphericity, mean curvature, and Euler characteristic to provide richer structural insights. Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design, drastically speeding up the design of new objects. The dataset, code, and fine-tuned models are available online.
