Table of Contents
Fetching ...

NARF24: Estimating Articulated Object Structure for Implicit Rendering

Stanley Lewis, Tom Gao, Odest Chadwicke Jenkins

TL;DR

NARF24 tackles the challenge of articulated-object understanding for robots by learning a shared NeRF across a few configurations and using image-space part segmentations to infer joint parameters. The method builds per-part point clouds, registers them across scenes with ICP and Teaser++, and estimates joint connectivity and type via Chamfer-distance comparisons to enable URDF-like modeling and configuration-conditioned rendering in an articulation-aware NeRF. Real-world experiments (including a sparse-label scenario) and a simulated 6-DOF arm demonstrate that accurate articulation estimation and configurable rendering are achievable with limited segmentation data. The approach promises scalable articulation modeling by combining NeRF with parts-based segmentation, registration, and classical joint-estimation techniques.

Abstract

Articulated objects and their representations pose a difficult problem for robots. These objects require not only representations of geometry and texture, but also of the various connections and joint parameters that make up each articulation. We propose a method that learns a common Neural Radiance Field (NeRF) representation across a small number of collected scenes. This representation is combined with a parts-based image segmentation to produce an implicit space part localization, from which the connectivity and joint parameters of the articulated object can be estimated, thus enabling configuration-conditioned rendering.

NARF24: Estimating Articulated Object Structure for Implicit Rendering

TL;DR

NARF24 tackles the challenge of articulated-object understanding for robots by learning a shared NeRF across a few configurations and using image-space part segmentations to infer joint parameters. The method builds per-part point clouds, registers them across scenes with ICP and Teaser++, and estimates joint connectivity and type via Chamfer-distance comparisons to enable URDF-like modeling and configuration-conditioned rendering in an articulation-aware NeRF. Real-world experiments (including a sparse-label scenario) and a simulated 6-DOF arm demonstrate that accurate articulation estimation and configurable rendering are achievable with limited segmentation data. The approach promises scalable articulation modeling by combining NeRF with parts-based segmentation, registration, and classical joint-estimation techniques.

Abstract

Articulated objects and their representations pose a difficult problem for robots. These objects require not only representations of geometry and texture, but also of the various connections and joint parameters that make up each articulation. We propose a method that learns a common Neural Radiance Field (NeRF) representation across a small number of collected scenes. This representation is combined with a parts-based image segmentation to produce an implicit space part localization, from which the connectivity and joint parameters of the articulated object can be estimated, thus enabling configuration-conditioned rendering.
Paper Structure (10 sections, 5 figures)

This paper contains 10 sections, 5 figures.

Figures (5)

  • Figure 1: NARF24 is a pipeline which takes in part-segmented images of an articulated object at a small number of configurations, then utilizes a scene-conditioned neural radiance field to estimate part poses and joint parameters. These create a URDF model and a subsequent articulation enabled NeRF for configurable rendering.
  • Figure 2: NARF24's output when trained on the clamp in the ProgressTools dataset. Left: the original dataset image. Middle: The NARF24 output at the original pose and configuartion values. Right: The NARF24 output at a counterfactual configuration (fully closed).
  • Figure 3: NARF24's output when trained on the clamp with only 5 percent segmentation labeling. Left: The ground truth image. Center, Right Two counterfactual renderings of the clamp at different configurations, overlaid on the greyscale original image for context.
  • Figure 4: Qualitative ablation study on training a single-part NeRF subsequent to the part registration step. Top is the ground truth part, and bottom is the NeRF rendering. Left (ICP only): Performs adequately. Center (Teaser++ only): fails for single-part NeRF training. Right (ICP & Teaser++): Produces the best output.
  • Figure 5: Left: Example output from the simulator environment. Right: Renderings of the arm at two different configurations of every joint, after training on the generated sim data (overlaid at base part)