CADEvolve: Creating Realistic CAD via Program Evolution
Maksim Elistratov, Marina Barannikov, Gregory Ivanov, Valentin Khrulkov, Anton Konushin, Andrey Kuznetsov, Dmitrii Zhemchuzhnikov
TL;DR
CADEvolve tackles the data scarcity challenge in CAD automation by introducing an offline evolution-based data synthesis pipeline that progressively builds rich, multi-operation CadQuery programs from simple primitives. A three-tier CADEvolve-3L dataset (G, P, C) and the CADEvolve-M model enable state-of-the-art Image2CAD reconstruction across multiple benchmarks through supervised fine-tuning and reinforcement learning with geometry-aware rewards. The approach combines LLM-guided proposals, staged validation, canonicalization, and diverse augmentations to produce a scalable, executable CAD corpus, addressing limitations of sketch–extrude corpora and restricted operator sets. The results demonstrate improved reconstruction fidelity and generalization, suggesting that principled data synthesis can substantially boost multimodal CAD workflows and other program-synthesis tasks in engineering domains.
Abstract
Computer-Aided Design (CAD) delivers rapid, editable modeling for engineering and manufacturing. Recent AI progress now makes full automation feasible for various CAD tasks. However, progress is bottlenecked by data: public corpora mostly contain sketch-extrude sequences, lack complex operations, multi-operation composition and design intent, and thus hinder effective fine-tuning. Attempts to bypass this with frozen VLMs often yield simple or invalid programs due to limited 3D grounding in current foundation models. We present CADEvolve, an evolution-based pipeline and dataset that starts from simple primitives and, via VLM-guided edits and validations, incrementally grows CAD programs toward industrial-grade complexity. The result is 8k complex parts expressed as executable CadQuery parametric generators. After multi-stage post-processing and augmentation, we obtain a unified dataset of 1.3m scripts paired with rendered geometry and exercising the full CadQuery operation set. A VLM fine-tuned on CADEvolve achieves state-of-the-art results on the Image2CAD task across the DeepCAD, Fusion 360, and MCB benchmarks.
