Table of Contents
Fetching ...

CAGE: Controllable Articulation GEneration

Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi-Amiri, Manolis Savva

TL;DR

This work leverages the interplay between part shape, connectivity, and motion using a de-noising diffusion-based method with attention modules de-signed to extract correlations between part attributes to generate 3D articulated objects in a controllable fashion.

Abstract

We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly. We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules designed to extract correlations between part attributes. Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters. The generated objects conform to user-specified constraints on the object category, part shape, and part articulation. Our experiments show that our method outperforms the state-of-the-art in articulated object generation, producing more realistic objects while conforming better to user constraints. Video Summary at: http://youtu.be/cH_rbKbyTpE

CAGE: Controllable Articulation GEneration

TL;DR

This work leverages the interplay between part shape, connectivity, and motion using a de-noising diffusion-based method with attention modules de-signed to extract correlations between part attributes to generate 3D articulated objects in a controllable fashion.

Abstract

We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly. We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules designed to extract correlations between part attributes. Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters. The generated objects conform to user-specified constraints on the object category, part shape, and part articulation. Our experiments show that our method outperforms the state-of-the-art in articulated object generation, producing more realistic objects while conforming better to user constraints. Video Summary at: http://youtu.be/cH_rbKbyTpE
Paper Structure (22 sections, 2 equations, 12 figures, 4 tables)

This paper contains 22 sections, 2 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: We present CAGE: a user-controllable generative model for 3D articulated objects. Left: given an object category label and a directed graph describing the interconnections among constituent parts, our model generates an abstraction of the articulated object specifying both geometry and motion parameters for each part. Right: the generated shape abstraction combined with appropriately constrained part retrieval allows for generating high-quality articulated objects under various user-specified constraints. Users can specify a desired object category, part shape, articulation type, or articulation axis and obtain generated objects that respect the provided constraints.
  • Figure 2: Method overview. Our generative model is based on DDPM ho2020ddpm. In the forward pass, Gaussian noise is iteratively added to corrupt the data from $\mathbf{x}_0$ to random noise $\mathbf{x}_T$. During the reverse process, our denoiser (in yellow highlight) predicts the residual noise to be subtracted from the input data $\mathbf{x}_t$ at timestep $t$ conditioned on the category label $c$ and a graph adjacency $G$ as an attention mask injected in the Graph Relation Attention module. All the timesteps share the same denoiser that is built on layers of our Attribute Attention Blocks.
  • Figure 3: Design of the attention modules within our attribute attention blocks (AAB). Each node attribute is projected to a separate token and sequentially passed to three attention modules with varied masking strategies. White cells signify activated attention positions, whereas grey cells indicate attention that has been masked out. In graph relation attention, the activated cells represent the parent and child parts associated with each node.
  • Figure 4: Qualitative results conditioned on graph structures (on the left) at different levels of complexity. We compare our method with a comparable version of NAP. Our generated objects are faithfully compatible with the graph input. In contrast, NAP fails to conform to the input constraint with flipped or disordered node connections. We denote inconsistent graph connections using red arrows.
  • Figure 5: Part$\rightarrow$Motion: generated results conditioned on graphs specifying part bounding boxes.
  • ...and 7 more figures