Table of Contents
Fetching ...

CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers

Dimitrios Mallis, Ahmet Serdar Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, Djamila Aouada

TL;DR

CAD-Assistant introduces a general-purpose, tool-augmented VLLM framework for AI-assisted CAD that uses a Python-identified FreeCAD environment and a diverse CAD toolset to translate multimodal user inputs into executable CAD commands. By employing a VLLM planner (GPT-4o) to generate plans and Python actions executed within FreeCAD, the system iteratively refines geometry based on evolving state feedback, addressing geometric reasoning limitations inherent to VLLMs. Key contributions include a training-free, extensible toolset; multimodal CAD representations; and an evaluation on CAD benchmarks (SGPBench, autoconstraining, hand-drawn parameterization) showing improvements over VLLM baselines and task-specific methods, plus qualitative demonstrations of real-world workflows such as 3D reconstruction from sketches and cross-section parameterization. The framework demonstrates the potential of tool-augmented VLLMs to act as generic CAD solvers, enabling interpretable, editable CAD code and broad applicability across design tasks with practical impact for designers and automated CAD pipelines.

Abstract

We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific tools. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including a sketch image parameterizer, rendering modules, a 2D cross-section generator, and other specialized routines. CAD-Assistant is evaluated on multiple CAD benchmarks, where it outperforms VLLM baselines and supervised task-specific methods. Beyond existing benchmarks, we qualitatively demonstrate the potential of tool-augmented VLLMs as general-purpose CAD solvers across diverse workflows.

CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers

TL;DR

CAD-Assistant introduces a general-purpose, tool-augmented VLLM framework for AI-assisted CAD that uses a Python-identified FreeCAD environment and a diverse CAD toolset to translate multimodal user inputs into executable CAD commands. By employing a VLLM planner (GPT-4o) to generate plans and Python actions executed within FreeCAD, the system iteratively refines geometry based on evolving state feedback, addressing geometric reasoning limitations inherent to VLLMs. Key contributions include a training-free, extensible toolset; multimodal CAD representations; and an evaluation on CAD benchmarks (SGPBench, autoconstraining, hand-drawn parameterization) showing improvements over VLLM baselines and task-specific methods, plus qualitative demonstrations of real-world workflows such as 3D reconstruction from sketches and cross-section parameterization. The framework demonstrates the potential of tool-augmented VLLMs to act as generic CAD solvers, enabling interpretable, editable CAD code and broad applicability across design tasks with practical impact for designers and automated CAD pipelines.

Abstract

We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific tools. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including a sketch image parameterizer, rendering modules, a 2D cross-section generator, and other specialized routines. CAD-Assistant is evaluated on multiple CAD benchmarks, where it outperforms VLLM baselines and supervised task-specific methods. Beyond existing benchmarks, we qualitatively demonstrate the potential of tool-augmented VLLMs as general-purpose CAD solvers across diverse workflows.

Paper Structure

This paper contains 28 sections, 4 equations, 19 figures, 12 tables.

Figures (19)

  • Figure 1: CAD-Assistant is a tool-augmented VLLM framework for AI-assisted CAD. Our framework generates FreeCAD FreeCAD code that is executed within CAD software directly and can process multimodal inputs, including textual queries, sketches, drawn commands and 3D scans. This figure showcases various examples of generic CAD queries and the responses generated by CAD-Assistant.
  • Figure 2: Overview of CAD-Assistant framework. A multimodal user request is provided as context to a VLLM planner $\mathcal{P}$. At step $t$, the planner generates a plan $p_t$ and an action $a_t$ (python code). The action is executed on an environment $\mathcal{E}$ and the generated execution output $f_t$ is fed back to the planner, enabling generation for the next timestep.
  • Figure 3: Execution flow for autoconstraining. The sketch recognizer function is utilized for multimodal CAD understanding. Constraints are generated over multiple timesteps.
  • Figure 4: Classification of failure case types for erroneous responses in the CAD Question Answering task.
  • Figure 5: Real-world CAD use cases. (Left) The CAD-Assistant generated a 3D solid conditioned on a handdrawn sketch image. (Center) Our method reconstructs a 3D scan via cross-section parameterization. (Right) The CAD-Assistant semantically interprets the drawn operation and fulfills user requests directly without composing CAD-specific tools.
  • ...and 14 more figures