Table of Contents
Fetching ...

GeoCAD: Local Geometry-Controllable CAD Generation with Large Language Models

Zhanwei Zhang, Kaiyuan Liu, Junjie Liu, Wenxiao Wang, Binbin Lin, Liang Xie, Chen Shen, Deng Cai

TL;DR

GeoCAD addresses the challenge of local geometry-controllable CAD generation by introducing a complementary captioning pipeline and a two-stage, LoRA-based fine-tuning regime for large language models. Simple parts are captioned via vertex-based analysis, while complex parts are captioned with vision-language models, together yielding approximately 221k annotated local parts. The method trains LLMs to fill masked local parts under geometric constraints, enabling users to modify specific local loops while keeping the rest of the CAD model intact. Evaluations on the DeepCAD dataset demonstrate improvements in generation quality, validity, and text-to-CAD consistency compared with baselines, highlighting GeoCAD’s potential to streamline geometry-driven CAD edits in practice.

Abstract

Local geometry-controllable computer-aided design (CAD) generation aims to modify local parts of CAD models automatically, enhancing design efficiency. It also ensures that the shapes of newly generated local parts follow user-specific geometric instructions (e.g., an isosceles right triangle or a rectangle with one corner cut off). However, existing methods encounter challenges in achieving this goal. Specifically, they either lack the ability to follow textual instructions or are unable to focus on the local parts. To address this limitation, we introduce GeoCAD, a user-friendly and local geometry-controllable CAD generation method. Specifically, we first propose a complementary captioning strategy to generate geometric instructions for local parts. This strategy involves vertex-based and VLLM-based captioning for systematically annotating simple and complex parts, respectively. In this way, we caption $\sim$221k different local parts in total. In the training stage, given a CAD model, we randomly mask a local part. Then, using its geometric instruction and the remaining parts as input, we prompt large language models (LLMs) to predict the masked part. During inference, users can specify any local part for modification while adhering to a variety of predefined geometric instructions. Extensive experiments demonstrate the effectiveness of GeoCAD in generation quality, validity and text-to-CAD consistency. Code will be available at https://github.com/Zhanwei-Z/GeoCAD.

GeoCAD: Local Geometry-Controllable CAD Generation with Large Language Models

TL;DR

GeoCAD addresses the challenge of local geometry-controllable CAD generation by introducing a complementary captioning pipeline and a two-stage, LoRA-based fine-tuning regime for large language models. Simple parts are captioned via vertex-based analysis, while complex parts are captioned with vision-language models, together yielding approximately 221k annotated local parts. The method trains LLMs to fill masked local parts under geometric constraints, enabling users to modify specific local loops while keeping the rest of the CAD model intact. Evaluations on the DeepCAD dataset demonstrate improvements in generation quality, validity, and text-to-CAD consistency compared with baselines, highlighting GeoCAD’s potential to streamline geometry-driven CAD edits in practice.

Abstract

Local geometry-controllable computer-aided design (CAD) generation aims to modify local parts of CAD models automatically, enhancing design efficiency. It also ensures that the shapes of newly generated local parts follow user-specific geometric instructions (e.g., an isosceles right triangle or a rectangle with one corner cut off). However, existing methods encounter challenges in achieving this goal. Specifically, they either lack the ability to follow textual instructions or are unable to focus on the local parts. To address this limitation, we introduce GeoCAD, a user-friendly and local geometry-controllable CAD generation method. Specifically, we first propose a complementary captioning strategy to generate geometric instructions for local parts. This strategy involves vertex-based and VLLM-based captioning for systematically annotating simple and complex parts, respectively. In this way, we caption 221k different local parts in total. In the training stage, given a CAD model, we randomly mask a local part. Then, using its geometric instruction and the remaining parts as input, we prompt large language models (LLMs) to predict the masked part. During inference, users can specify any local part for modification while adhering to a variety of predefined geometric instructions. Extensive experiments demonstrate the effectiveness of GeoCAD in generation quality, validity and text-to-CAD consistency. Code will be available at https://github.com/Zhanwei-Z/GeoCAD.

Paper Structure

This paper contains 21 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Local geometry-controllable CAD generation achieved by GeoCAD. The input comprises: (1) an original CAD model (the left side), (2) the local part to be modified (highlighted in blue), and (3) user-specific geometric instructions. Subsequently, GeoCAD outputs the revised CAD models where only the target part is altered while adhering to the provided geometric instructions.
  • Figure 2: The complementary captioning strategy. (a) Vertex-based captioning for simple local parts. Vertex coordinates are initially extracted, followed by geometric analysis to enable precise captions. (b) VLLM-based captioning for complex local parts. We first convert complex parts into 2D images and subsequently employ powerful VLLMs to produce descriptive captions.
  • Figure 3: The prompt template used in stage 1. Local parts are first augmented through translation, scaling, rotation, and reflection. Subsequently, we construct the corresponding prompt that incorporates the geometric instruction, and ask LLMs to predict both the initial and augmented parts.
  • Figure 4: The prompt template used in stage 2. Given a local part (highlighted in blue) in a CAD model, we formulate the prompt that integrates the geometric instruction (highlighted in green) and the remaining parts of the CAD model, and require LLMs to predict this local part.
  • Figure 5: The overall framework of GeoCAD. (a) Training process. Given a CAD model, we randomly mask a local loop within it. During stages 1 and 2, we design the corresponding prompts (as introduced in Fig. \ref{['fig31']} and Fig. \ref{['fig3']}), and fine-tune LLMs. (b) Inference process. Users can optionally mask any local part for modification, driven by various geometric instructions (GI). The mask part is then infilled with the predicted local parts to construct new CAD models.
  • ...and 8 more figures