Table of Contents
Fetching ...

CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs

Leah Chong, Jude Rayan, Steven Dow, Ioanna Lykourentzou, Faez Ahmed

TL;DR

This paper tackles the mismatch between text-to-image generative models and engineering feasibility by introducing CAD image prompting, which uses a CLIP-searched CAD image as an image prompt alongside the text prompt to steer generation toward more feasible designs. A bike-design case study with Stable Diffusion 2.1 demonstrates that CAD prompting increases perceived feasibility, with a detectable tradeoff: higher CAD weights improve feasibility but reduce novelty, particularly beyond weight ~0.83. The authors provide stage-aware guidelines for selecting prompting weights and argue that this approach broadens the applicability of T2I models in engineering design, potentially enabling closer integration with image-to-CAD workflows for iterative design refinement. Overall, the method offers a practical pathway to make generative imagery more actionable in engineering contexts, while acknowledging model-specific effects and the need for further validation across tasks.

Abstract

Text-to-image generative models have increasingly been used to assist designers during concept generation in various creative domains, such as graphic design, user interface design, and fashion design. However, their applications in engineering design remain limited due to the models' challenges in generating images of feasible designs concepts. To address this issue, this paper introduces a method that improves the design feasibility by prompting the generation with feasible CAD images. In this work, the usefulness of this method is investigated through a case study with a bike design task using an off-the-shelf text-to-image model, Stable Diffusion 2.1. A diverse set of bike designs are produced in seven different generation settings with varying CAD image prompting weights, and these designs are evaluated on their perceived feasibility and novelty. Results demonstrate that the CAD image prompting successfully helps text-to-image models like Stable Diffusion 2.1 create visibly more feasible design images. While a general tradeoff is observed between feasibility and novelty, when the prompting weight is kept low around 0.35, the design feasibility is significantly improved while its novelty remains on par with those generated by text prompts alone. The insights from this case study offer some guidelines for selecting the appropriate CAD image prompting weight for different stages of the engineering design process. When utilized effectively, our CAD image prompting method opens doors to a wider range of applications of text-to-image models in engineering design.

CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs

TL;DR

This paper tackles the mismatch between text-to-image generative models and engineering feasibility by introducing CAD image prompting, which uses a CLIP-searched CAD image as an image prompt alongside the text prompt to steer generation toward more feasible designs. A bike-design case study with Stable Diffusion 2.1 demonstrates that CAD prompting increases perceived feasibility, with a detectable tradeoff: higher CAD weights improve feasibility but reduce novelty, particularly beyond weight ~0.83. The authors provide stage-aware guidelines for selecting prompting weights and argue that this approach broadens the applicability of T2I models in engineering design, potentially enabling closer integration with image-to-CAD workflows for iterative design refinement. Overall, the method offers a practical pathway to make generative imagery more actionable in engineering contexts, while acknowledging model-specific effects and the need for further validation across tasks.

Abstract

Text-to-image generative models have increasingly been used to assist designers during concept generation in various creative domains, such as graphic design, user interface design, and fashion design. However, their applications in engineering design remain limited due to the models' challenges in generating images of feasible designs concepts. To address this issue, this paper introduces a method that improves the design feasibility by prompting the generation with feasible CAD images. In this work, the usefulness of this method is investigated through a case study with a bike design task using an off-the-shelf text-to-image model, Stable Diffusion 2.1. A diverse set of bike designs are produced in seven different generation settings with varying CAD image prompting weights, and these designs are evaluated on their perceived feasibility and novelty. Results demonstrate that the CAD image prompting successfully helps text-to-image models like Stable Diffusion 2.1 create visibly more feasible design images. While a general tradeoff is observed between feasibility and novelty, when the prompting weight is kept low around 0.35, the design feasibility is significantly improved while its novelty remains on par with those generated by text prompts alone. The insights from this case study offer some guidelines for selecting the appropriate CAD image prompting weight for different stages of the engineering design process. When utilized effectively, our CAD image prompting method opens doors to a wider range of applications of text-to-image models in engineering design.
Paper Structure (12 sections, 3 figures, 4 tables)

This paper contains 12 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Our CAD image prompting method. Based on a designer's text prompt, the CLIP model searches for the most suitable CAD image that is then input into a T2I model and guides the image generation process alongside the original text prompt.
  • Figure 2: Example evaluation questions for a bike design concept
  • Figure 3: Perceived feasibility and novelty evaluation results for the bike designs generated in the seven generation settings. (a) shows a positive correlation between the weight of CAD image prompt and the perceived feasibility of generated images, while (b) shows a negative correlation.