Table of Contents
Fetching ...

Make-A-Shape: a Ten-Million-scale 3D Shape Model

Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu

TL;DR

This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes, and extends the framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities.

Abstract

Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes. Technical-wise, we first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme to efficiently exploit coefficient relations. We then make the representation generatable by a diffusion model by devising the subband coefficients packing scheme to layout the representation in a low-resolution grid. Further, we derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients. Last, we extend our framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities, e.g., single/multi-view images, point clouds, and low-resolution voxels. In our extensive set of experiments, we demonstrate various applications, such as unconditional generation, shape completion, and conditional generation on a wide range of modalities. Our approach not only surpasses the state of the art in delivering high-quality results but also efficiently generates shapes within a few seconds, often achieving this in just 2 seconds for most conditions. Our source code is available at https://github.com/AutodeskAILab/Make-a-Shape.

Make-A-Shape: a Ten-Million-scale 3D Shape Model

TL;DR

This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes, and extends the framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities.

Abstract

Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes. Technical-wise, we first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme to efficiently exploit coefficient relations. We then make the representation generatable by a diffusion model by devising the subband coefficients packing scheme to layout the representation in a low-resolution grid. Further, we derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients. Last, we extend our framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities, e.g., single/multi-view images, point clouds, and low-resolution voxels. In our extensive set of experiments, we demonstrate various applications, such as unconditional generation, shape completion, and conditional generation on a wide range of modalities. Our approach not only surpasses the state of the art in delivering high-quality results but also efficiently generates shapes within a few seconds, often achieving this in just 2 seconds for most conditions. Our source code is available at https://github.com/AutodeskAILab/Make-a-Shape.
Paper Structure (24 sections, 2 equations, 18 figures, 8 tables)

This paper contains 24 sections, 2 equations, 18 figures, 8 tables.

Figures (18)

  • Figure 1: Make-A-Shape is a large 3D generative model trained on over 10 millions diverse 3D shapes. As demonstrated above, it exhibits the capability of unconditionally generating a large variety of 3D shapes over a wide range of object categories, featuring intricate geometric details, plausible structures, nontrivial topologies, and clean surfaces.
  • Figure 2: Make-A-Shape is able to generate a large variety of shapes for diverse input modalities: single-view images (rows 1 & 2), multi-view images (rows 3 & 4), point clouds (rows 5 & 6), voxels (rows 7 & 8), and incomplete inputs (last row). The resolution of the voxels in rows 7 & 8 are $16^3$ and $32^3$, respectively. In the top eight rows, odd columns show the inputs whereas even columns show the generated shapes. In the last row, columns 1 & 4 show the partial input whereas the remaining columns show the diverse completed shapes.
  • Figure 3: Reconstructing the SDF of a shape (a) using different methods: (b) Point-E nichol2022point, (c) Shap-E jun2023shap, (d) coarse coefficients $C_0$hui2022neural, and (e) our wavelet-tree representation. Our approach (e) can more faithfully reconstruct the shape's structure and details.
  • Figure 4: Overview of our generative approach. (a) A shape is first encoded into a truncated signed distance field (TSDF), then decomposed into multi-scale wavelet coefficients in a wavelet-tree structure. We design the subband coefficient filtering procedure to exploit the relations among coefficients and extract information-rich coefficients to build our wavelet-tree representation. (b) We propose the subband coefficient packing scheme to rearrange our wavelet-tree representation into a regular grid structure of manageable spatial resolution, so that we can adopt a denoising diffusion model to effectively generate the representation. (c) Further, we formulate the subband adaptive training strategy to effectively balance the shape information in different subbands and address the detail coefficient sparsity. Hence, we can efficiently train our model on millions of 3D shapes. (d) Our framework can be extended to condition on various modalities.
  • Figure 5: Wavelet decomposition of the input shape, represented as a TSDF, recursively into coarse coefficients $C_i$ and detail coefficients $\{ D_i^{LH}, D_i^{HL}, D_i^{HH} \}$. Note that in the 3D case, there will be seven subbands of detail coefficients in each decomposition.
  • ...and 13 more figures