ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
Jiayi Su, Youhe Feng, Zheng Li, Jinhua Song, Yangfan He, Botao Ren, Botian Xu
TL;DR
ArtFormer introduces a tree-structured articulation parameterization and a diffusion-based SDF shape prior to jointly generate high-quality geometry and kinematic relations for 3D articulated objects. A dedicated Articulation Transformer with tree-position embeddings and cross-attention enables conditional, autoregressive decoding of parts, while the shape prior ensures diverse yet plausible geometry. Experiments on text- and image-conditioned generation demonstrate strong geometry fidelity, accurate joint relations, and enhanced diversity compared with baselines, with additional support from ablations and human studies. The approach supports novel shape generation and editing, offering a flexible framework for scalable, controllable articulated object synthesis with potential applications in robotics and digital twins.
Abstract
This paper presents a novel framework for modeling and conditional generation of 3D articulated objects. Troubled by flexibility-quality tradeoffs, existing methods are often limited to using predefined structures or retrieving shapes from static datasets. To address these challenges, we parameterize an articulated object as a tree of tokens and employ a transformer to generate both the object's high-level geometry code and its kinematic relations. Subsequently, each sub-part's geometry is further decoded using a signed-distance-function (SDF) shape prior, facilitating the synthesis of high-quality 3D shapes. Our approach enables the generation of diverse objects with high-quality geometry and varying number of parts. Comprehensive experiments on conditional generation from text descriptions demonstrate the effectiveness and flexibility of our method.
