Table of Contents
Fetching ...

CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

Jesse Barkley, Rumi Loghmani, Amir Barati Farimani

Abstract

Existing methods for text-to-CAD generation either operate in a single pass with no geometric verification or rely on lossy visual feedback that cannot resolve dimensional errors. We present CADSmith, a multi-agent pipeline that generates CadQuery code from natural language. It then undergoes an iterative refinement process through two nested correction loops: an inner loop that resolves execution errors and an outer loop grounded in programmatic geometric validation. The outer loop combines exact measurements from the OpenCASCADE kernel (bounding box dimensions, volume, solid validity) with holistic visual assessment from an independent vision-language model Judge. This provides both the numerical precision and the high-level shape awareness needed to converge on the correct geometry. The system uses retrieval-augmented generation over API documentation rather than fine-tuning, maintaining a current database as the underlying CAD library evolves. We evaluate on a custom benchmark of 100 prompts in three difficulty tiers (T1 through T3) with three ablation configurations. Against a zero-shot baseline, CADSmith achieves a 100% execution rate (up from 95%), improves the median F1 score from 0.9707 to 0.9846, the median IoU from 0.8085 to 0.9629, and reduces the mean Chamfer Distance from 28.37 to 0.74, demonstrating that closed-loop refinement with programmatic geometric feedback substantially improves the quality and reliability of LLM-generated CAD models.

CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

Abstract

Existing methods for text-to-CAD generation either operate in a single pass with no geometric verification or rely on lossy visual feedback that cannot resolve dimensional errors. We present CADSmith, a multi-agent pipeline that generates CadQuery code from natural language. It then undergoes an iterative refinement process through two nested correction loops: an inner loop that resolves execution errors and an outer loop grounded in programmatic geometric validation. The outer loop combines exact measurements from the OpenCASCADE kernel (bounding box dimensions, volume, solid validity) with holistic visual assessment from an independent vision-language model Judge. This provides both the numerical precision and the high-level shape awareness needed to converge on the correct geometry. The system uses retrieval-augmented generation over API documentation rather than fine-tuning, maintaining a current database as the underlying CAD library evolves. We evaluate on a custom benchmark of 100 prompts in three difficulty tiers (T1 through T3) with three ablation configurations. Against a zero-shot baseline, CADSmith achieves a 100% execution rate (up from 95%), improves the median F1 score from 0.9707 to 0.9846, the median IoU from 0.8085 to 0.9629, and reduces the mean Chamfer Distance from 28.37 to 0.74, demonstrating that closed-loop refinement with programmatic geometric feedback substantially improves the quality and reliability of LLM-generated CAD models.

Paper Structure

This paper contains 20 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: CADSmith pipeline overview. A natural language prompt flows through five agents with two nested correction loops: an inner loop for execution errors and an outer loop for geometric refinement driven by kernel metrics and three-view vision.
  • Figure 2: Representative benchmark parts across three difficulty tiers. T1 parts are single primitives, T2 parts involve boolean combinations and hole patterns, and T3 parts require multi-step construction with workplane changes, sweeps, and complex feature interactions.
  • Figure 4: Three-view render provided to the Validator Judge at each iteration. Left: isometric view showing overall shape. Center: high-angle rear view revealing top-face features (bolt holes, center bore). Right: front profile showing the vertical structure (flanges, hub, keyway). These views are rendered from the generated STL using VTK with Phong shading. The Judge cross-references what it sees in these views against the kernel metrics and the original prompt to catch failures that numerical checks alone would miss.
  • Figure 6: T3_019 (quadcopter frame): F1 = 0.963, IoU = 0.985. The part passed all validation checks, but contains subtle gaps between the arms and central hub (visible in the front profile view at right). This near-miss failure evaded both kernel metrics and the vision Judge.