Table of Contents
Fetching ...

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian

TL;DR

FlexCAD tackles the challenge of controllable CAD generation across all SEM construction hierarchies by converting CAD models to a structured text and fine-tuning LLMs with hierarchy-aware masking. The method enables infilling of any masked hierarchical field with a single unified model, using LoRA to train an 8B LLM efficiently. On the DeepCAD benchmark, FlexCAD achieves superior generation quality and controllability compared with baselines like GPT-4o, SkexGen, and Hnc-cad, and it supports iterative, multi-hierarchy editing. This work advances CAD design workflows by providing a practical, end-to-end approach for user-intent-driven CAD generation and editing.

Abstract

Recently, there is a growing interest in creating computer-aided design (CAD) models based on user intent, known as controllable CAD generation. Existing work offers limited controllability and needs separate models for different types of control, reducing efficiency and practicality. To achieve controllable generation across all CAD construction hierarchies, such as sketch-extrusion, extrusion, sketch, face, loop and curve, we propose FlexCAD, a unified model by fine-tuning large language models (LLMs). First, to enhance comprehension by LLMs, we represent a CAD model as a structured text by abstracting each hierarchy as a sequence of text tokens. Second, to address various controllable generation tasks in a unified model, we introduce a hierarchy-aware masking strategy. Specifically, during training, we mask a hierarchy-aware field in the CAD text with a mask token. This field, composed of a sequence of tokens, can be set flexibly to represent various hierarchies. Subsequently, we ask LLMs to predict this masked field. During inference, the user intent is converted into a CAD text with a mask token replacing the part the user wants to modify, which is then fed into FlexCAD to generate new CAD models. Comprehensive experiments on public dataset demonstrate the effectiveness of FlexCAD in both generation quality and controllability. Code will be available at https://github.com/microsoft/FlexCAD.

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

TL;DR

FlexCAD tackles the challenge of controllable CAD generation across all SEM construction hierarchies by converting CAD models to a structured text and fine-tuning LLMs with hierarchy-aware masking. The method enables infilling of any masked hierarchical field with a single unified model, using LoRA to train an 8B LLM efficiently. On the DeepCAD benchmark, FlexCAD achieves superior generation quality and controllability compared with baselines like GPT-4o, SkexGen, and Hnc-cad, and it supports iterative, multi-hierarchy editing. This work advances CAD design workflows by providing a practical, end-to-end approach for user-intent-driven CAD generation and editing.

Abstract

Recently, there is a growing interest in creating computer-aided design (CAD) models based on user intent, known as controllable CAD generation. Existing work offers limited controllability and needs separate models for different types of control, reducing efficiency and practicality. To achieve controllable generation across all CAD construction hierarchies, such as sketch-extrusion, extrusion, sketch, face, loop and curve, we propose FlexCAD, a unified model by fine-tuning large language models (LLMs). First, to enhance comprehension by LLMs, we represent a CAD model as a structured text by abstracting each hierarchy as a sequence of text tokens. Second, to address various controllable generation tasks in a unified model, we introduce a hierarchy-aware masking strategy. Specifically, during training, we mask a hierarchy-aware field in the CAD text with a mask token. This field, composed of a sequence of tokens, can be set flexibly to represent various hierarchies. Subsequently, we ask LLMs to predict this masked field. During inference, the user intent is converted into a CAD text with a mask token replacing the part the user wants to modify, which is then fed into FlexCAD to generate new CAD models. Comprehensive experiments on public dataset demonstrate the effectiveness of FlexCAD in both generation quality and controllability. Code will be available at https://github.com/microsoft/FlexCAD.

Paper Structure

This paper contains 18 sections, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Controllable CAD generation achieved by FlexCAD. In each sub-figure, the left side shows the input: an original CAD model along with the part the user intends to modify (highlighted in blue). The right side displays the output: multiple new CAD models with only the chosen part changed. Users have the flexibility to specify the part in any CAD construction hierarchies, ranging from coarse levels like sketch-extrusion to fine levels like curve (as illustrated from (a) to (f)).
  • Figure 2: The overall framework of FlexCAD. (a) Training process. Initially, a CAD model is converted into a structured text. Next, a hierarchy-aware masking strategy is proposed to mask a specific field in the text with a special mask token. This field is set differently at each epoch to reflect various hierarchies. Then, LLMs are fine-tuned to predict the masked field. (b) Inference process. The original CAD model is transformed into a structured text with a mask token replacing the part the user wants to change. The fine-tuned LLMs are provided with this masked text to generate diverse predictions, which are then converted into new CAD models by infilling and rendering.
  • Figure 3: (a) An illustration for construction hierarchies of a CAD model. (b) Structured text representation for the CAD model shown in (a). The colors beneath the texts in (b) are used to indicate the relationship to construction hierarchies depicted in (a), e.g., blue for a curve and green for a loop.
  • Figure 4: (a) illustrates a CAD model and its structural diagram. (b), (c), (d) and (e) are four examples for prompt templates with the mask tokens designed to represent different construction hierarchies. The masked field for different hierarchies in the CAD model are highlighted in blue.
  • Figure 5: Qualitative comparison results for four methods. The first row displays three original CAD models, where the color of each sketch-extrusion aligns with that in the corresponding structural diagrams. In the following rows, given a CAD model, we randomly select its four newly predicted models for each method. The marks below the predictions are the corresponding masked and modified sketches or extrusions. The red boxes illustrate some of the most unrealistic examples. The blue boxes indicate some of the most obvious cases, where multiple fields simultaneously change in the same CAD model, rather than just the expected masked field.
  • ...and 12 more figures