Table of Contents
Fetching ...

ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model

Rao Fu, Xiao Zhan, Yiwen Chen, Daniel Ritchie, Srinath Sridhar

TL;DR

ShapeCrafter introduces recursive text-conditioned 3D shape generation by evolving a shape distribution as phrase sequences are appended. A Text2Shape++ dataset enables training for long, varied phrase inputs, while a P-VQ-VAE latent grid and a transformer-based autoregressive model realize iterative, detail-preserving shape refinement. The approach achieves competitive shape quality and superior text–shape alignment, with demonstrated editing and extrapolation capabilities, particularly on chair/table categories. Limitations include restricted appearance attributes, a non-reversible process, and potential biases from dataset scope.

Abstract

We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step. However, humans tend to describe shapes recursively-we may start with an initial description and progressively add details based on intermediate results. To capture this recursive process, we introduce a method to generate a 3D shape distribution, conditioned on an initial phrase, that gradually evolves as more phrases are added. Since existing datasets are insufficient for training this approach, we present Text2Shape++, a large dataset of 369K shape-text pairs that supports recursive shape generation. To capture local details that are often used to refine shape descriptions, we build on top of vector-quantized deep implicit functions that generate a distribution of high-quality shapes. Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added. Our method supports shape editing, extrapolation, and can enable new applications in human-machine collaboration for creative design.

ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model

TL;DR

ShapeCrafter introduces recursive text-conditioned 3D shape generation by evolving a shape distribution as phrase sequences are appended. A Text2Shape++ dataset enables training for long, varied phrase inputs, while a P-VQ-VAE latent grid and a transformer-based autoregressive model realize iterative, detail-preserving shape refinement. The approach achieves competitive shape quality and superior text–shape alignment, with demonstrated editing and extrapolation capabilities, particularly on chair/table categories. Limitations include restricted appearance attributes, a non-reversible process, and potential biases from dataset scope.

Abstract

We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step. However, humans tend to describe shapes recursively-we may start with an initial description and progressively add details based on intermediate results. To capture this recursive process, we introduce a method to generate a 3D shape distribution, conditioned on an initial phrase, that gradually evolves as more phrases are added. Since existing datasets are insufficient for training this approach, we present Text2Shape++, a large dataset of 369K shape-text pairs that supports recursive shape generation. To capture local details that are often used to refine shape descriptions, we build on top of vector-quantized deep implicit functions that generate a distribution of high-quality shapes. Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added. Our method supports shape editing, extrapolation, and can enable new applications in human-machine collaboration for creative design.
Paper Structure (23 sections, 10 equations, 10 figures, 7 tables)

This paper contains 23 sections, 10 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: ShapeCrafter is a method for recursive text-conditioned 3D shape generation. Given an initial phrase (A chair), it generates a shape distribution (4 samples shown in the leftmost column). As more phrases are added, the initial shape is refined (2 samples shown). We can handle long phrase sequences while continuously evolving an initial shape (shape editing for a fixed random seed). Our method also shows extrapolation capabilities (legs are armrests, bottom right).
  • Figure 2: An example from Text2Shape++. Constituency parser kitaev-klein-2018-constituency annotates a sentence with syntactic structure by decomposing it into phrases. Text2Shape++ contains phrase sequences, and each phrase sequence corresponds to one or more shapes.
  • Figure 3: ShapeCrafter learns the probability distribution of latent features for each cell in $Z$.
  • Figure 4: (Top) We take input text phrases and extract semantic features using BERT. These features are projected to a feature grid $C_t$ which is concatenated with the latent feature code distribution $Z_{t-1}$ from the previous time step. Residual blocks $\Psi(\cdot)$ output the feature grid distribution $Z_t$ for the current time step. (2) During step $t$ of inf1.2erence, we combine $C_t$ and $Z_{t-1}$ to obtain $Z_t$ which is sampled to produce 3D shapes.
  • Figure 5: Qualitative comparison with AutoSDF mittal2022autosdf. ShapeCrafter produces sequentially more consistent shapes compared to AutoSDF.
  • ...and 5 more figures