Table of Contents
Fetching ...

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

Mohammad Sadil Khan, Sankalp Sinha, Talha Uddin Sheikh, Didier Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal

TL;DR

This work proposes Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels and introduces a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT.

Abstract

Prototyping complex computer-aided design (CAD) models in modern softwares can be very time-consuming. This is due to the lack of intelligent systems that can quickly generate simpler intermediate parts. We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels. Furthermore, we introduce a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT. The dataset contains $\sim170$K models and $\sim660$K text annotations, from abstract CAD descriptions (e.g., generate two concentric cylinders) to detailed specifications (e.g., draw two circles with center $(x,y)$ and radius $r_{1}$, $r_{2}$, and extrude along the normal by $d$...). Within the Text2CAD framework, we propose an end-to-end transformer-based auto-regressive network to generate parametric CAD models from input texts. We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy. Our proposed framework shows great potential in AI-aided design applications. Our source code and annotations will be publicly available.

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

TL;DR

This work proposes Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels and introduces a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT.

Abstract

Prototyping complex computer-aided design (CAD) models in modern softwares can be very time-consuming. This is due to the lack of intelligent systems that can quickly generate simpler intermediate parts. We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels. Furthermore, we introduce a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT. The dataset contains K models and K text annotations, from abstract CAD descriptions (e.g., generate two concentric cylinders) to detailed specifications (e.g., draw two circles with center and radius , , and extrude along the normal by ...). Within the Text2CAD framework, we propose an end-to-end transformer-based auto-regressive network to generate parametric CAD models from input texts. We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy. Our proposed framework shows great potential in AI-aided design applications. Our source code and annotations will be publicly available.
Paper Structure (16 sections, 9 equations, 13 figures, 3 tables)

This paper contains 16 sections, 9 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Designers can efficiently generate parametric CAD models from text prompts. The prompts can vary from abstract shape descriptions to detailed parametric instructions.
  • Figure 2: Text2CAD Data Annotation Pipeline: Our data annotation pipeline generates multi-level text prompts describing the construction workflow of a CAD model with varying complexities. We use a two-stage method - (Stage 1) Shape description generation using VLM (Stage 2) Multi-Level textual annotation generation using LLM.
  • Figure 3: Network architecture: Text2CAD Transformer takes as input a text prompt $T$ and a CAD subsequence $\mathbf{C}_{1:t-1}$ of length ${t-1}$. The text embedding $\mathbf{T}_{adapt}$ is extracted from $T$ using a pretrained BeRT Encoder ( Bert) followed by a trainable Adaptive layer. The resulting embedding $\mathbf{T}_{adapt}$ and the CAD sequence embedding $\mathbf{F}^0_{t-1}$ is passed through $\mathbf{L}$ decoder blocks to generate the full CAD sequence in auto-regressive way.
  • Figure 4: Parametric CAD model generation by Text2CAD transformer using different text prompts. Our text prompts follow a certain structure highlighting the different design aspects of CAD construction workflow (shown in different colors).
  • Figure 5: Qualitative results of the reconstructed CAD models of DeepCAD Wu_2021_ICCV and Text2CAD on DeepCAD Wu_2021_ICCV dataset. From top to bottom - Input Texts, Reconstructed CAD models using DeepCAD and Text2CAD respectively and $\operatorname{GPT-4V}$ Evaluation.
  • ...and 8 more figures