Table of Contents
Fetching ...

Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

Jianning Deng, Kartic Subr, Hakan Bilen

TL;DR

A novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts by using an implicit model from the first observation, and distils the part segmentation and articulation from the second observation while rendering the latter observation.

Abstract

We propose a novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by using an implicit model from the first observation, distils the part segmentation and articulation from the second observation while rendering the latter observation. Additionally, to tackle the complexities in the joint optimization of part segmentation and articulation, we propose a voxel grid-based initialization strategy and a decoupled optimization procedure. Compared to the prior unsupervised work, our model obtains significantly better performance, and generalizes to objects with multiple parts while it can be efficiently from few views for the latter observation.

Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

TL;DR

A novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts by using an implicit model from the first observation, and distils the part segmentation and articulation from the second observation while rendering the latter observation.

Abstract

We propose a novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by using an implicit model from the first observation, distils the part segmentation and articulation from the second observation while rendering the latter observation. Additionally, to tackle the complexities in the joint optimization of part segmentation and articulation, we propose a voxel grid-based initialization strategy and a decoupled optimization procedure. Compared to the prior unsupervised work, our model obtains significantly better performance, and generalizes to objects with multiple parts while it can be efficiently from few views for the latter observation.
Paper Structure (33 sections, 5 equations, 12 figures, 7 tables)

This paper contains 33 sections, 5 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: (a) Our method learns the geometry and appearance of an articulated object by first fitting a NeRF from (source) images of an object in a fixed articulation. Then, from another set of (target) images of the object in another articulation, we distill the relative articulation and part labels. Green lines show the gradient path during this distillation. (b) Using the part geometry and appearance from NeRF, we render the target images by compositing the parts after applying the predicted articulations to the segmented parts. The photometric error provides the required supervision for learning the parts and their articulation without groundtruth labels.
  • Figure 2: Voxel initialization: identify the voxels belonging to moved parts based on pixel opacity difference.
  • Figure 3: Illustration for optimization of $M_\ell$. The green dotted line shows the gradient flow.
  • Figure 4: Qualitative 2D part segmentation results. Pixels in green denotes the movable parts. Our method demonstrates consistent performance across all tested objects while PARIS failed for Blade, Laptop and Scissor.
  • Figure 5: Qualitative results for 2D multi-part segmentation. The pink color denotes the static part, while other colors denote the moving parts.
  • ...and 7 more figures