Table of Contents
Fetching ...

AutoPartGen: Autogressive 3D Part Generation and Discovery

Minghao Chen, Jianyuan Wang, Roman Shapovalov, Tom Monnier, Hyunyoung Jung, Dilin Wang, Rakesh Ranjan, Iro Laina, Andrea Vedaldi

TL;DR

<3-5 sentence high-level summary> AutoPartGen presents an autoregressive framework for compositional 3D generation, modeling objects as sequences of parts within a latent 3D shape space built on VecSet and diffusion. It unifies object-to-parts, image-to-parts, and masks-to-parts tasks in a single model, generating an unknown number of parts conditioned on previously created parts and evidence, without requiring extra optimization. The method achieves state-of-the-art part generation on PartObjaverse-Tiny, and demonstrates scalable applications to 3D scenes and city generation, including integration with SynCity. By combining latent compositionality, diffusion-based part completion, and autoregressive generation, AutoPartGen enables coherent, flexible, and scalable 3D content creation from diverse inputs.

Abstract

We introduce AutoPartGen, a model that generates objects composed of 3D parts in an autoregressive manner. This model can take as input an image of an object, 2D masks of the object's parts, or an existing 3D object, and generate a corresponding compositional 3D reconstruction. Our approach builds upon 3DShape2VecSet, a recent latent 3D representation with powerful geometric expressiveness. We observe that this latent space exhibits strong compositional properties, making it particularly well-suited for part-based generation tasks. Specifically, AutoPartGen generates object parts autoregressively, predicting one part at a time while conditioning on previously generated parts and additional inputs, such as 2D images, masks, or 3D objects. This process continues until the model decides that all parts have been generated, thus determining automatically the type and number of parts. The resulting parts can be seamlessly assembled into coherent objects or scenes without requiring additional optimization. We evaluate both the overall 3D generation capabilities and the part-level generation quality of AutoPartGen, demonstrating that it achieves state-of-the-art performance in 3D part generation.

AutoPartGen: Autogressive 3D Part Generation and Discovery

TL;DR

<3-5 sentence high-level summary> AutoPartGen presents an autoregressive framework for compositional 3D generation, modeling objects as sequences of parts within a latent 3D shape space built on VecSet and diffusion. It unifies object-to-parts, image-to-parts, and masks-to-parts tasks in a single model, generating an unknown number of parts conditioned on previously created parts and evidence, without requiring extra optimization. The method achieves state-of-the-art part generation on PartObjaverse-Tiny, and demonstrates scalable applications to 3D scenes and city generation, including integration with SynCity. By combining latent compositionality, diffusion-based part completion, and autoregressive generation, AutoPartGen enables coherent, flexible, and scalable 3D content creation from diverse inputs.

Abstract

We introduce AutoPartGen, a model that generates objects composed of 3D parts in an autoregressive manner. This model can take as input an image of an object, 2D masks of the object's parts, or an existing 3D object, and generate a corresponding compositional 3D reconstruction. Our approach builds upon 3DShape2VecSet, a recent latent 3D representation with powerful geometric expressiveness. We observe that this latent space exhibits strong compositional properties, making it particularly well-suited for part-based generation tasks. Specifically, AutoPartGen generates object parts autoregressively, predicting one part at a time while conditioning on previously generated parts and additional inputs, such as 2D images, masks, or 3D objects. This process continues until the model decides that all parts have been generated, thus determining automatically the type and number of parts. The resulting parts can be seamlessly assembled into coherent objects or scenes without requiring additional optimization. We evaluate both the overall 3D generation capabilities and the part-level generation quality of AutoPartGen, demonstrating that it achieves state-of-the-art performance in 3D part generation.

Paper Structure

This paper contains 41 sections, 2 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: AutoPartGen can be applied, by itself or in combination with other models, to the generation of compositional 3D objects, scenes and cities starting from 3D models, images or text.
  • Figure 2: AutoPartGen generates parts autoregressively. At each step, a 3D latent diffusion model generate the next part, conditioned on the previously generated parts $\boldsymbol{z}^{(1,\dots,k)}$, the overall object $\tilde{\boldsymbol{z}}$, and, optionally, an image $I$ of the object and an image $J^{(k)}$ of the part. The latent representation uses 3DShape2VecSet and the diffusion model is a DiT.
  • Figure 3: Compositionality of the VecSet space. Concatenation of two latents will result in a spatial combined mesh.
  • Figure 4: Image-to-parts scenario. Given an input image, AutoPartGen recovers a compositional 3D object made up of several meaningful and complete parts.
  • Figure 5: Object-to-parts scenario. Given an input 3D object, AutoPartGen regenerates it as a composition of meaningful and complete 3D parts.
  • ...and 7 more figures