Table of Contents
Fetching ...

CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs

Haocheng Yuan, Jing Xu, Hao Pan, Adrien Bousseau, Niloy J. Mitra, Changjian Li

TL;DR

This work introduces the problem of semantic commenting CAD programs, wherein the goal is to segment the input program into code blocks corresponding to semantically meaningful shape parts and assign a semantic label to each block.

Abstract

CAD programs are a popular way to compactly encode shapes as a sequence of operations that are easy to parametrically modify. However, without sufficient semantic comments and structure, such programs can be challenging to understand, let alone modify. We introduce the problem of semantic commenting CAD programs, wherein the goal is to segment the input program into code blocks corresponding to semantically meaningful shape parts and assign a semantic label to each block. We solve the problem by combining program parsing with visual-semantic analysis afforded by recent advances in foundational language and vision models. Specifically, by executing the input programs, we create shapes, which we use to generate conditional photorealistic images to make use of semantic annotators for such images. We then distill the information across the images and link back to the original programs to semantically comment on them. Additionally, we collected and annotated a benchmark dataset, CADTalk, consisting of 5,288 machine-made programs and 45 human-made programs with ground truth semantic comments. We extensively evaluated our approach, compared it to a GPT-based baseline, and an open-set shape segmentation baseline, and reported an 83.24% accuracy on the new CADTalk dataset. Code and data: https://enigma-li.github.io/CADTalk/.

CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs

TL;DR

This work introduces the problem of semantic commenting CAD programs, wherein the goal is to segment the input program into code blocks corresponding to semantically meaningful shape parts and assign a semantic label to each block.

Abstract

CAD programs are a popular way to compactly encode shapes as a sequence of operations that are easy to parametrically modify. However, without sufficient semantic comments and structure, such programs can be challenging to understand, let alone modify. We introduce the problem of semantic commenting CAD programs, wherein the goal is to segment the input program into code blocks corresponding to semantically meaningful shape parts and assign a semantic label to each block. We solve the problem by combining program parsing with visual-semantic analysis afforded by recent advances in foundational language and vision models. Specifically, by executing the input programs, we create shapes, which we use to generate conditional photorealistic images to make use of semantic annotators for such images. We then distill the information across the images and link back to the original programs to semantically comment on them. Additionally, we collected and annotated a benchmark dataset, CADTalk, consisting of 5,288 machine-made programs and 45 human-made programs with ground truth semantic comments. We extensively evaluated our approach, compared it to a GPT-based baseline, and an open-set shape segmentation baseline, and reported an 83.24% accuracy on the new CADTalk dataset. Code and data: https://enigma-li.github.io/CADTalk/.
Paper Structure (30 sections, 1 equation, 14 figures, 8 tables)

This paper contains 30 sections, 1 equation, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Given a CAD program as input, our algorithm -- CADTalker -- automatically generates comments before each code blocks to describe the shape part that is generated by the block (left). We evaluate our algorithm on a new dataset of commented CAD programs -- CADTalk -- that contains both human-made and machine-made CAD programs (right).
  • Figure 2: Algorithm overview. We first parse the input program to identify commentable code blocks, marked with TBC (a). We then execute the program and render the resulting shape under several viewpoints to obtain multiview depth maps, which we convert into realistic images using image-to-image translation (b). In addition, we obtain a list of part names of the shape from ChatGPT. We use these labels to segment semantic parts in the images using computer vision foundation models (c). Finally, we aggregate this semantic information across views by linking it to the code blocks that correspond to the segmented parts (d).
  • Figure 3: Given shapes (left) after executing programs, we use ControlNet zhang2023ControlNet to convert rendered depth maps into realistic images (middle), which form a valid input for detection and segmentation models trained on photographs liu2023groundingkirillov2023segment (right).
  • Figure 4: Program parsing. Irreducible blocks are basic-level geometric primitives and their direct compositions (a), while commentable blocks are code blocks of different compositional levels that correspond to semantic comments (b). The downward traversal of the syntax tree is used to identify irreducible blocks (c) and the upward traversal to collect commentable blocks (d). Exemplar masks of commentable blocks are shown in (c) and (d) in red.
  • Figure 5: CADTalk Dataset. Example shapes from CADTalk (left) along with ground-truth (right) and predicted comments (far right). In these examples, our prediction matches the ground truth, except for the Moai sculpture where CADTalker labeled the "head" code block as "body". Machine-made shapes are rendered with dark blue and placed behind the human-made shapes rendered with light blue.
  • ...and 9 more figures