Table of Contents
Fetching ...

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

Xilin Wang, Jia Zheng, Yuanchao Hu, Hao Zhu, Qian Yu, Zihan Zhou

TL;DR

This work tackles reconstructing 3D parametric models from 2D CAD drawings by treating the input as a raster image processed by a ViT and producing a flexible, text-based script that describes a sequence of primitives. The CAD2Program pipeline combines a vision-language foundation model (ViT+LLM) with a Python-based shape program to represent arbitrary primitives without fixed slot templates, enabling scalable handling of model-specific parameters. A large cabinet dataset (368K models, 373 primitives) supports supervised fine-tuning, and experiments show that raster inputs plus annotation layers improve accuracy while the text-based output matches domain-specific representations in performance and offers greater flexibility. The approach demonstrates robust reconstruction across diverse drawings and hints at broad applicability to other CAD domains and future tasks like CAD-oriented visual question answering.

Abstract

In this paper, we present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings. Our proposed method is inspired by recent successes in vision-language models (VLMs), and departs from traditional methods which rely on task-specific data representations and/or algorithms. Specifically, on the input side, we simply treat the 2D CAD drawing as a raster image, regardless of its original format, and encode the image with a standard ViT model. We show that such an encoding scheme achieves competitive performance against existing methods that operate on vector-graphics inputs, while imposing substantially fewer restrictions on the 2D drawings. On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form. Compared to other sequence modeling methods for CAD which use domain-specific sequence representations with fixed-size slots, our text-based representation is more flexible, and can be easily extended to arbitrary geometric entities and semantic or functional properties. Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

TL;DR

This work tackles reconstructing 3D parametric models from 2D CAD drawings by treating the input as a raster image processed by a ViT and producing a flexible, text-based script that describes a sequence of primitives. The CAD2Program pipeline combines a vision-language foundation model (ViT+LLM) with a Python-based shape program to represent arbitrary primitives without fixed slot templates, enabling scalable handling of model-specific parameters. A large cabinet dataset (368K models, 373 primitives) supports supervised fine-tuning, and experiments show that raster inputs plus annotation layers improve accuracy while the text-based output matches domain-specific representations in performance and offers greater flexibility. The approach demonstrates robust reconstruction across diverse drawings and hints at broad applicability to other CAD domains and future tasks like CAD-oriented visual question answering.

Abstract

In this paper, we present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings. Our proposed method is inspired by recent successes in vision-language models (VLMs), and departs from traditional methods which rely on task-specific data representations and/or algorithms. Specifically, on the input side, we simply treat the 2D CAD drawing as a raster image, regardless of its original format, and encode the image with a standard ViT model. We show that such an encoding scheme achieves competitive performance against existing methods that operate on vector-graphics inputs, while imposing substantially fewer restrictions on the 2D drawings. On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form. Compared to other sequence modeling methods for CAD which use domain-specific sequence representations with fixed-size slots, our text-based representation is more flexible, and can be easily extended to arbitrary geometric entities and semantic or functional properties. Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.

Paper Structure

This paper contains 17 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Illustration of the geometry and annotation layers of a CAD drawing. See text for details.
  • Figure 2: Problem statement. Given (a) a 2D CAD drawing of a product (e.g., a cabinet), our goal is to reconstruct (b) the 3D model of a product. (c) In CAD software, the 3D model is conventionally built by assembling pre-defined primitive models, where (d) each primitive model is defined by a computer program describing its model ID and a number of parameters.
  • Figure 3: Illustration of the model-specific parameters of a "base box" primitive. N is the number of vertically divided spaces in the box. [NKA, NKB, $\ldots$] are the widths of the divided spaces. DBXX indicates the position of the frame, where DBXX=1 means "no frame", DBXX=2 means "lower frame", and DBXX=3 means "upper frame".
  • Figure 4: Python shape program describing the cabinet in \ref{['fig:pipeline']}. Every two lines correspond to a primitive model in \ref{['fig:pipeline']}(c).
  • Figure 5: An overview of the CAD2Program model.
  • ...and 7 more figures