Table of Contents
Fetching ...

CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing

Yu Yuan, Shizhao Sun, Qi Liu, Jiang Bian

TL;DR

This work tackles text-based CAD editing by introducing CAD-Editor, which combines an automated data synthesis pipeline with a locate-then-infill framework powered by LLMs and LVLMs. It constructs triplet training data (editing instruction, original SE sequence, edited SE sequence) by pairing design variations with difference summarization, and then decomposes editing into locating regions to modify and infilling precise edits. Empirical results on the DeepCAD dataset show CAD-Editor achieves superior validity and text-CAD alignment, outperforming baselines and ablations across multiple metrics and scenarios. The approach offers a practical path to editable, instruction-driven CAD models while highlighting opportunities for cost-efficient data generation and broader benchmarking.

Abstract

Computer Aided Design (CAD) is indispensable across various industries. \emph{Text-based CAD editing}, which automates the modification of CAD models based on textual instructions, holds great potential but remains underexplored. Existing methods primarily focus on design variation generation or text-based CAD generation, either lacking support for text-based control or neglecting existing CAD models as constraints. We introduce \emph{CAD-Editor}, the first framework for text-based CAD editing. To address the challenge of demanding triplet data with accurate correspondence for training, we propose an automated data synthesis pipeline. This pipeline utilizes design variation models to generate pairs of original and edited CAD models and employs Large Vision-Language Models (LVLMs) to summarize their differences into editing instructions. To tackle the composite nature of text-based CAD editing, we propose a locate-then-infill framework that decomposes the task into two focused sub-tasks: locating regions requiring modification and infilling these regions with appropriate edits. Large Language Models (LLMs) serve as the backbone for both sub-tasks, leveraging their capabilities in natural language understanding and CAD knowledge. Experiments show that CAD-Editor achieves superior performance both quantitatively and qualitatively. The code is available at \url {https://github.com/microsoft/CAD-Editor}.

CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing

TL;DR

This work tackles text-based CAD editing by introducing CAD-Editor, which combines an automated data synthesis pipeline with a locate-then-infill framework powered by LLMs and LVLMs. It constructs triplet training data (editing instruction, original SE sequence, edited SE sequence) by pairing design variations with difference summarization, and then decomposes editing into locating regions to modify and infilling precise edits. Empirical results on the DeepCAD dataset show CAD-Editor achieves superior validity and text-CAD alignment, outperforming baselines and ablations across multiple metrics and scenarios. The approach offers a practical path to editable, instruction-driven CAD models while highlighting opportunities for cost-efficient data generation and broader benchmarking.

Abstract

Computer Aided Design (CAD) is indispensable across various industries. \emph{Text-based CAD editing}, which automates the modification of CAD models based on textual instructions, holds great potential but remains underexplored. Existing methods primarily focus on design variation generation or text-based CAD generation, either lacking support for text-based control or neglecting existing CAD models as constraints. We introduce \emph{CAD-Editor}, the first framework for text-based CAD editing. To address the challenge of demanding triplet data with accurate correspondence for training, we propose an automated data synthesis pipeline. This pipeline utilizes design variation models to generate pairs of original and edited CAD models and employs Large Vision-Language Models (LVLMs) to summarize their differences into editing instructions. To tackle the composite nature of text-based CAD editing, we propose a locate-then-infill framework that decomposes the task into two focused sub-tasks: locating regions requiring modification and infilling these regions with appropriate edits. Large Language Models (LLMs) serve as the backbone for both sub-tasks, leveraging their capabilities in natural language understanding and CAD knowledge. Experiments show that CAD-Editor achieves superior performance both quantitatively and qualitatively. The code is available at \url {https://github.com/microsoft/CAD-Editor}.

Paper Structure

This paper contains 21 sections, 4 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: Text-based CAD editing achieved by CAD-Editor. Each sub-figure shows the editing instruction at the top, the original CAD model on the left, and the edited CAD model on the right. The rendered image is shown for better comprehension. The actual editing occurs on sketch-and-extrusion (SE) operations of a CAD model to provide editability and reusability.
  • Figure 2: Left: Example input and output for CAD-Editor. The input combines the original CAD sequence with the editing instruction, and the output is the edited CAD sequence. The specific CAD sequence is shortened to '[Original (or Edited) CAD Sequence]' to save space. Right: An illustration for a specific CAD sequence and its rendered CAD model.
  • Figure 3: Illustration of automated data synthesis pipeline.
  • Figure 4: (a)-(b): Overview of Locate-then-Infill framework. (c): Examples of input and output, where the left column shows abstracted representations using legends, the middle column displays concrete sequences and the right column presents rendered visual objects.
  • Figure 5: Qualitative results from CAD-Editor, GPT-4o-Basic and GPT-4o-IC .
  • ...and 11 more figures