NAP: Neural 3D Articulation Prior

Jiahui Lei; Congyue Deng; Bokui Shen; Leonidas Guibas; Kostas Daniilidis

NAP: Neural 3D Articulation Prior

Jiahui Lei, Congyue Deng, Bokui Shen, Leonidas Guibas, Kostas Daniilidis

TL;DR

<3-5 sentence high-level summary>NAP introduces Neural 3D Articulation Prior, the first deep generative approach to synthesize 3D articulated objects by modeling geometry and motion jointly through an articulation tree/graph parameterization. A diffusion-denoising process on graphs, powered by a graph-attention network, learns the joint distribution of parts and joints, followed by a post-processing step that yields valid articulated structures. A new Instantiation Distance metric enables evaluation of both shape and motion fidelity, and the framework supports conditioned generation such as Part2Motion, PartNet-Imagination, Motion2Part, and GAPart2Object. The approach provides a principled prior for articulated objects with potential impact on design, robotics, and interactive simulation.

Abstract

We propose Neural 3D Articulation Prior (NAP), the first 3D deep generative model to synthesize 3D articulated object models. Despite the extensive research on generating 3D objects, compositions, or scenes, there remains a lack of focus on capturing the distribution of articulated objects, a common object category for human and robot interaction. To generate articulated objects, we first design a novel articulation tree/graph parameterization and then apply a diffusion-denoising probabilistic model over this representation where articulated objects can be generated via denoising from random complete graphs. In order to capture both the geometry and the motion structure whose distribution will affect each other, we design a graph-attention denoising network for learning the reverse diffusion process. We propose a novel distance that adapts widely used 3D generation metrics to our novel task to evaluate generation quality, and experiments demonstrate our high performance in articulated object generation. We also demonstrate several conditioned generation applications, including Part2Motion, PartNet-Imagination, Motion2Part, and GAPart2Object.

NAP: Neural 3D Articulation Prior

TL;DR

Abstract

Paper Structure (34 sections, 11 equations, 8 figures, 2 tables)

This paper contains 34 sections, 11 equations, 8 figures, 2 tables.

Introduction
Related Work
Articulated object modeling
Generative models for structures
Diffusion models in 3D
Method
Articulation Tree Parameterization
Graph-based representation
Nodes
Edges
Diffusion-Based Articulation Tree Generation
Forward diffusion
Reverse process
Training objective
Output Extraction
...and 19 more sections

Figures (8)

Figure 1: Neural 3D Articulation Prior (NAP) can unconditionally generate articulated objects (left). It can be conditioned on just parts or joints (mid), a subset of parts plus joints, or over-segmented static objects (right).
Figure 2: Method: Parameterization (Top): We parameterize the articulated object as a tree, whose nodes are rigid parts and edges are joints; we then pad the tree to a complete graph of maximum node number and store it in the articulation graph attribute list ${\mathbf{x}}$. Articulated objects including the joint motion given joint states can be animated from this representation. Forward Diffusion (Middle): The parameterized attribute list ${\mathbf{x}}$ is gradually diffused to random noise. Generation (Bottom): Our Graph Attention Denoiser (Fig \ref{['fig:method_network']}, Sec. \ref{['sec:method_network']}) samples a random articulation graph ${\mathbf{x}}_T$, gradually removes noise, and finally predicts ${\mathbf{x}}_0$. A minimum-spanning-tree algorithm is applied to the generated graph in the end to find the kinematic tree structure.
Figure 3: Network architecture. Left: input the node and edge list of a noisy articulation graph, a stack of graph layers will fuse and exchange information on the graph and output the noise that has to be removed. Right: details in the graph layer.
Figure 4: Articulated object generation results. Each generated object is visualized with (1) graph topology (top left), where the edge color means blue--prismatic, red--revolute, and orange--hybrid; (2) the predicted part bounding boxes and joints under different joint states (second column), and the overlay of multiple states reflecting the possible motion (bottom left); (3) reconstructed part meshes from the generated shape code (third column); (4) retrieved part meshes (right column).
Figure 5: Part2Motion: Known part condition on the left, diverse motion proposals on the right.
...and 3 more figures

NAP: Neural 3D Articulation Prior

TL;DR

Abstract

NAP: Neural 3D Articulation Prior

Authors

TL;DR

Abstract

Table of Contents

Figures (8)