ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model
Rao Fu, Xiao Zhan, Yiwen Chen, Daniel Ritchie, Srinath Sridhar
TL;DR
ShapeCrafter introduces recursive text-conditioned 3D shape generation by evolving a shape distribution as phrase sequences are appended. A Text2Shape++ dataset enables training for long, varied phrase inputs, while a P-VQ-VAE latent grid and a transformer-based autoregressive model realize iterative, detail-preserving shape refinement. The approach achieves competitive shape quality and superior text–shape alignment, with demonstrated editing and extrapolation capabilities, particularly on chair/table categories. Limitations include restricted appearance attributes, a non-reversible process, and potential biases from dataset scope.
Abstract
We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step. However, humans tend to describe shapes recursively-we may start with an initial description and progressively add details based on intermediate results. To capture this recursive process, we introduce a method to generate a 3D shape distribution, conditioned on an initial phrase, that gradually evolves as more phrases are added. Since existing datasets are insufficient for training this approach, we present Text2Shape++, a large dataset of 369K shape-text pairs that supports recursive shape generation. To capture local details that are often used to refine shape descriptions, we build on top of vector-quantized deep implicit functions that generate a distribution of high-quality shapes. Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added. Our method supports shape editing, extrapolation, and can enable new applications in human-machine collaboration for creative design.
