Table of Contents
Fetching ...

Query2CAD: Generating CAD models using natural language queries

Akshay Badagabettu, Sai Sravan Yarlagadda, Amir Barati Farimani

TL;DR

Query2CAD presents a novel pipeline that converts natural language CAD queries into executable FreeCAD macros, augmented by self-refinement loops guided by VQA-based similarity of isometric views and BLIP2-captioned feedback, with optional human-in-the-loop input. Using strong LLMs like GPT-4 Turbo, it achieves a first-attempt success of 53.6% and improves to 76.7% after refinements, with the first refinement offering the largest gains. The system is evaluated on a 57-query dataset spanning easy, medium, and hard tasks, attaining 95.23% accuracy on easy queries, 70% on medium, and 41.7% on hard queries; weaker models show markedly poorer performance. The authors open-source their data, model, and code, and demonstrate that iterative feedback—especially the initial refinement—substantially enhances CAD design generation from natural language, signaling practical potential for rapid CAD prototyping without extensive CAD training.

Abstract

Computer Aided Design (CAD) engineers typically do not achieve their best prototypes in a single attempt. Instead, they iterate and refine their designs to achieve an optimal solution through multiple revisions. This traditional approach, though effective, is time-consuming and relies heavily on the expertise of skilled engineers. To address these challenges, we introduce Query2CAD, a novel framework to generate CAD designs. The framework uses a large language model to generate executable CAD macros. Additionally, Query2CAD refines the generation of the CAD model with the help of its self-refinement loops. Query2CAD operates without supervised data or additional training, using the LLM as both a generator and a refiner. The refiner leverages feedback generated by the BLIP2 model, and to address false negatives, we have incorporated human-in-the-loop feedback into our system. Additionally, we have developed a dataset that encompasses most operations used in CAD model designing and have evaluated our framework using this dataset. Our findings reveal that when we used GPT-4 Turbo as our language model, the architecture achieved a success rate of 53.6\% on the first attempt. With subsequent refinements, the success rate increased by 23.1\%. In particular, the most significant improvement in the success rate was observed with the first iteration of the refinement. With subsequent refinements, the accuracy of the correct designs did not improve significantly. We have open-sourced our data, model, and code (github.com/akshay140601/Query2CAD).

Query2CAD: Generating CAD models using natural language queries

TL;DR

Query2CAD presents a novel pipeline that converts natural language CAD queries into executable FreeCAD macros, augmented by self-refinement loops guided by VQA-based similarity of isometric views and BLIP2-captioned feedback, with optional human-in-the-loop input. Using strong LLMs like GPT-4 Turbo, it achieves a first-attempt success of 53.6% and improves to 76.7% after refinements, with the first refinement offering the largest gains. The system is evaluated on a 57-query dataset spanning easy, medium, and hard tasks, attaining 95.23% accuracy on easy queries, 70% on medium, and 41.7% on hard queries; weaker models show markedly poorer performance. The authors open-source their data, model, and code, and demonstrate that iterative feedback—especially the initial refinement—substantially enhances CAD design generation from natural language, signaling practical potential for rapid CAD prototyping without extensive CAD training.

Abstract

Computer Aided Design (CAD) engineers typically do not achieve their best prototypes in a single attempt. Instead, they iterate and refine their designs to achieve an optimal solution through multiple revisions. This traditional approach, though effective, is time-consuming and relies heavily on the expertise of skilled engineers. To address these challenges, we introduce Query2CAD, a novel framework to generate CAD designs. The framework uses a large language model to generate executable CAD macros. Additionally, Query2CAD refines the generation of the CAD model with the help of its self-refinement loops. Query2CAD operates without supervised data or additional training, using the LLM as both a generator and a refiner. The refiner leverages feedback generated by the BLIP2 model, and to address false negatives, we have incorporated human-in-the-loop feedback into our system. Additionally, we have developed a dataset that encompasses most operations used in CAD model designing and have evaluated our framework using this dataset. Our findings reveal that when we used GPT-4 Turbo as our language model, the architecture achieved a success rate of 53.6\% on the first attempt. With subsequent refinements, the success rate increased by 23.1\%. In particular, the most significant improvement in the success rate was observed with the first iteration of the refinement. With subsequent refinements, the accuracy of the correct designs did not improve significantly. We have open-sourced our data, model, and code (github.com/akshay140601/Query2CAD).
Paper Structure (10 sections, 1 equation, 5 figures, 2 tables)

This paper contains 10 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Proposed architecture of Query2CAD. The user query is passed to an LLM that generates a Python macro to generate the corresponding CAD model. The isometric view is captured, and refinement is performed if the VQA score does not cross the threshold. The loop is run for a maximum of 3 times.
  • Figure 2: The user query was to make a book shelf and a torus respectively. The book shelf was designed correctly within 1 refinement whereas the torus was designed correctly in the first attempt (direct generation)
  • Figure 3: The user query was to make a plate in the shape of a star. The LLM first generated a code that made a pentagon shape. The feedback given was to close that shape. It then made a closed pentagon with some thickness. The second feedback given was to alter the shape to a star. A star-shaped plate was then obtained
  • Figure 4: The bar chart shows the observed improvement in success rate after every iteration. $\Delta (y_0\rightarrow y_1)$ refers to the improvement in success rate from direct generation to first iteration. Similarly $\Delta (y_1\rightarrow y_2)$ and $\Delta (y_2\rightarrow y_3)$ refers to the improvement in success rates in subsequent iterations. All the values are in percentages.
  • Figure 5: 69% of the 13 failed cases when using GPT-4-turbo as the LLM was due to not getting an executable code and the remaining 31% of failures were due to generating the wrong structure. Similarly, when using GPT 3.5-turbo as the LLM, 65.4% of failures were due to not getting an executable code and 34.6% of failures were due to generating the wrong structure