Table of Contents
Fetching ...

PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision

Konstantinos Tertikas, Despoina Paschalidou, Boxiao Pan, Jeong Joon Park, Mikaela Angelina Uy, Ioannis Emiris, Yannis Avrithis, Leonidas Guibas

TL;DR

PartNeRF tackles editable 3D shape synthesis without explicit 3D supervision by representing objects as $M$ locally defined NeRFs, each with its own pose and scale. A decomposition network produces per-part shape/texture codes $oldsymbol{z}_m^s$, $oldsymbol{z}_m^t$, while a structure network predicts per-part transforms $(oldsymbol{R}_m,oldsymbol{t}_m,oldsymbol{s}_m)$; hard ray-part assignment ensures local edits affect only the targeted part. Trained with a suite of losses on posed images and masks, PartNeRF achieves plausible, textured 3D shapes and supports editing operations such as rigid/non-rigid part transformations, part mixing, and texture edits across ShapeNet categories. The approach advances 3D content creation by enabling intuitive, part-level control from 2D supervision, with potential extensions to richer representations and moving-object scenarios.

Abstract

Impressive progress in generative models and implicit representations gave rise to methods that can generate 3D shapes of high quality. However, being able to locally control and edit shapes is another essential property that can unlock several content creation applications. Local control can be achieved with part-aware models, but existing methods require 3D supervision and cannot produce textures. In this work, we devise PartNeRF, a novel part-aware generative model for editable 3D shape synthesis that does not require any explicit 3D supervision. Our model generates objects as a set of locally defined NeRFs, augmented with an affine transformation. This enables several editing operations such as applying transformations on parts, mixing parts from different objects etc. To ensure distinct, manipulable parts we enforce a hard assignment of rays to parts that makes sure that the color of each ray is only determined by a single NeRF. As a result, altering one part does not affect the appearance of the others. Evaluations on various ShapeNet categories demonstrate the ability of our model to generate editable 3D objects of improved fidelity, compared to previous part-based generative approaches that require 3D supervision or models relying on NeRFs.

PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision

TL;DR

PartNeRF tackles editable 3D shape synthesis without explicit 3D supervision by representing objects as locally defined NeRFs, each with its own pose and scale. A decomposition network produces per-part shape/texture codes , , while a structure network predicts per-part transforms ; hard ray-part assignment ensures local edits affect only the targeted part. Trained with a suite of losses on posed images and masks, PartNeRF achieves plausible, textured 3D shapes and supports editing operations such as rigid/non-rigid part transformations, part mixing, and texture edits across ShapeNet categories. The approach advances 3D content creation by enabling intuitive, part-level control from 2D supervision, with potential extensions to richer representations and moving-object scenarios.

Abstract

Impressive progress in generative models and implicit representations gave rise to methods that can generate 3D shapes of high quality. However, being able to locally control and edit shapes is another essential property that can unlock several content creation applications. Local control can be achieved with part-aware models, but existing methods require 3D supervision and cannot produce textures. In this work, we devise PartNeRF, a novel part-aware generative model for editable 3D shape synthesis that does not require any explicit 3D supervision. Our model generates objects as a set of locally defined NeRFs, augmented with an affine transformation. This enables several editing operations such as applying transformations on parts, mixing parts from different objects etc. To ensure distinct, manipulable parts we enforce a hard assignment of rays to parts that makes sure that the color of each ray is only determined by a single NeRF. As a result, altering one part does not affect the appearance of the others. Evaluations on various ShapeNet categories demonstrate the ability of our model to generate editable 3D objects of improved fidelity, compared to previous part-based generative approaches that require 3D supervision or models relying on NeRFs.
Paper Structure (33 sections, 36 equations, 30 figures, 7 tables)

This paper contains 33 sections, 36 equations, 30 figures, 7 tables.

Figures (30)

  • Figure 1: Part-Aware Controllable 3D Shape Generation and Editing. We address the task of part-aware 3D shape generation and editing without explicit 3D supervision. Prior part-aware generative models Hertz2022SIGGRAPHHao2020CVPR assume 3D supervision, at training, and only allow changing the shape of the object. In this work, we introduce PartNeRF, a generative model capable of editing the shape and appearance of generated shapes that are parametrized as a collection of locally defined NeRFs.
  • Figure 2: Method Overview. Our generative model is implemented as an auto-decoder and it comprises three main components: The Decomposition Network takes two object specific learnable embeddings $\{\mathbf{z}_s, \mathbf{z}_t\}$ that represent its shape and texture and maps them to a set of $M$ latent codes that control the shape and texture of each part. First, we map $\mathbf{z}^s$ and $\mathbf{z}^t$ to $M$ per-part embeddings $\{\hat{\mathbf{z}}_m^s\}_{m=1}^M$ and $\{\hat{\mathbf{z}}_m^t\}_{m=1}^M$ using $M$ linear projections, which are then fed to two transformer encoders: $\tau^s_\theta$ and $\tau^t_\theta$, that predict the final per-part shape and texture embeddings, $\{\mathbf{z}_m^s\}_{m=1}^M$ and $\{\mathbf{z}_m^t\}_{m=1}^M$. Next, the Structure Network maps the per-part shape feature representation $\mathbf{z}_m^s$ to a rotation matrix $\mathbf{R}_m$, a translation vector $\mathbf{t}_m$ and a scale vector $\mathbf{s}_m$ that define the coordinate system of the m-th part and its spatial extent. The last component of our model is the Neural Rendering module that takes the 3D points along each ray, transformed to the coordinate frame of its associated part, and maps them to an occupancy and a color value. We use plate notation to denote repetition over the $M$ parts.
  • Figure 3: Ray-Part Association. We illustrate the hard assignment between rays and parts in a 2D example with 3 parts and two rays $r$ and $r'$. Since the association between rays and parts is determined based on the first part that a ray interests, the associations that emerge from \ref{['eq:per_part_rays']} are $\mathcal{R}_1=\{r\}, \mathcal{R}_2=\{\emptyset\}$ and $\mathcal{R}_3=\{r'\}$.
  • Figure 4: Scene-Specific Editing. The top and bottom row show the rendered images and the part-based geometries respectively. The 1st column shows the tractor from a novel view, before editing. In the 2nd, we select the bucket and rotate it downwards, whereas in the 3rd we translate the cockpit to the floor. In the 4th, we perform isotropic scaling of the cockpit and in the last we change the color of the bucket to red.
  • Figure 5: Soft Ray-Part Assignment. We demonstrate that enforcing a soft ray-part assignment results in parts that do not preserve their texture across transformations.
  • ...and 25 more figures