GeoCAD: Local Geometry-Controllable CAD Generation with Large Language Models
Zhanwei Zhang, Kaiyuan Liu, Junjie Liu, Wenxiao Wang, Binbin Lin, Liang Xie, Chen Shen, Deng Cai
TL;DR
GeoCAD addresses the challenge of local geometry-controllable CAD generation by introducing a complementary captioning pipeline and a two-stage, LoRA-based fine-tuning regime for large language models. Simple parts are captioned via vertex-based analysis, while complex parts are captioned with vision-language models, together yielding approximately 221k annotated local parts. The method trains LLMs to fill masked local parts under geometric constraints, enabling users to modify specific local loops while keeping the rest of the CAD model intact. Evaluations on the DeepCAD dataset demonstrate improvements in generation quality, validity, and text-to-CAD consistency compared with baselines, highlighting GeoCAD’s potential to streamline geometry-driven CAD edits in practice.
Abstract
Local geometry-controllable computer-aided design (CAD) generation aims to modify local parts of CAD models automatically, enhancing design efficiency. It also ensures that the shapes of newly generated local parts follow user-specific geometric instructions (e.g., an isosceles right triangle or a rectangle with one corner cut off). However, existing methods encounter challenges in achieving this goal. Specifically, they either lack the ability to follow textual instructions or are unable to focus on the local parts. To address this limitation, we introduce GeoCAD, a user-friendly and local geometry-controllable CAD generation method. Specifically, we first propose a complementary captioning strategy to generate geometric instructions for local parts. This strategy involves vertex-based and VLLM-based captioning for systematically annotating simple and complex parts, respectively. In this way, we caption $\sim$221k different local parts in total. In the training stage, given a CAD model, we randomly mask a local part. Then, using its geometric instruction and the remaining parts as input, we prompt large language models (LLMs) to predict the masked part. During inference, users can specify any local part for modification while adhering to a variety of predefined geometric instructions. Extensive experiments demonstrate the effectiveness of GeoCAD in generation quality, validity and text-to-CAD consistency. Code will be available at https://github.com/Zhanwei-Z/GeoCAD.
