Table of Contents
Fetching ...

Learning From Design Procedure To Generate CAD Programs for Data Augmentation

Yan-Ying Chen, Dule Shu, Matthew Hong, Andrew Taber, Jonathan Li, Matthew Klenk

TL;DR

A novel data augmentation paradigm that prompts an LLM to generate CAD programs conditioned on a reference surface program and a modeling procedure is proposed - an idea inspired by practices in industrial design that enriches the geometric distribution of generated CAD models.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of code generation tasks. However, generating code for certain domains remains challenging. One such domain is Computer-Aided Design (CAD) program, where the goal is to produce scripted parametric models that define object geometry for precise design and manufacturing applications. A key challenge in LLM-based CAD program generation is the limited geometric complexity of generated shapes compared to those found in real-world industrial designs. This shortfall is in part due to the lack of diversity in the available CAD program training data. To address this, we propose a novel data augmentation paradigm that prompts an LLM to generate CAD programs conditioned on a reference surface program and a modeling procedure - an idea inspired by practices in industrial design. By varying the reference surface using a collection of organic shapes, our method enriches the geometric distribution of generated CAD models. In particular, it introduces edges and faces defined by spline-based curvature, which are typically missing or underrepresented in existing open-source CAD program datasets. Experiments show that our method produces CAD samples with significantly greater geometric diversity and a higher resemblance to industry-grade CAD designs in terms of the proportion of organic shape primitives. This enhancement makes our CAD data augmentation approach a useful tool for training LLMs and other deep learning models in CAD generation.

Learning From Design Procedure To Generate CAD Programs for Data Augmentation

TL;DR

A novel data augmentation paradigm that prompts an LLM to generate CAD programs conditioned on a reference surface program and a modeling procedure is proposed - an idea inspired by practices in industrial design that enriches the geometric distribution of generated CAD models.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of code generation tasks. However, generating code for certain domains remains challenging. One such domain is Computer-Aided Design (CAD) program, where the goal is to produce scripted parametric models that define object geometry for precise design and manufacturing applications. A key challenge in LLM-based CAD program generation is the limited geometric complexity of generated shapes compared to those found in real-world industrial designs. This shortfall is in part due to the lack of diversity in the available CAD program training data. To address this, we propose a novel data augmentation paradigm that prompts an LLM to generate CAD programs conditioned on a reference surface program and a modeling procedure - an idea inspired by practices in industrial design. By varying the reference surface using a collection of organic shapes, our method enriches the geometric distribution of generated CAD models. In particular, it introduces edges and faces defined by spline-based curvature, which are typically missing or underrepresented in existing open-source CAD program datasets. Experiments show that our method produces CAD samples with significantly greater geometric diversity and a higher resemblance to industry-grade CAD designs in terms of the proportion of organic shape primitives. This enhancement makes our CAD data augmentation approach a useful tool for training LLMs and other deep learning models in CAD generation.
Paper Structure (17 sections, 1 equation, 6 figures, 3 tables)

This paper contains 17 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: It is common to use a reference surface to guide CAD creation for specific design intent, e.g., compatibility to other component. Motivated by this idea, we propose to a new design procedure prompting to guide the CAD program generation toward more organic shapes.
  • Figure 2: System overview: (1) Design procedure prompting takes a design description and a reference surface program as an input to formulate the design procedure. (2) The design prompt then condition an LLM to generate a CAD program. (3) Program validation executes the generated program to visualize a CAD Brep. (4) Structure validation checks the validity of the CAD Brep.
  • Figure 3: Distribution of B-Spline ratio over data samples: Industry, DeepCAD-b and ours only contain bracket data. ABC, GenCAD and CAD-MLLM include brackets and other objects.
  • Figure 4: Examples of CAD B-rep visualization of our approach using reference surfaces, and the alternatives (-RT) and (-R) excluding reference surfaces. (-R) include a text guidance to prompt smooth and organic shapes. Ours with reference surfaces can generate more B-Spline shapes.
  • Figure 5: Visualization of the generated program modules. The left side of each column is a generated bracket with its reference surface, and the right side is the bracket after the reference surface is removed.
  • ...and 1 more figures