Table of Contents
Fetching ...

UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation

Bu Jin, Weize Li, Songen Gu, Yupeng Zheng, Yuhang Zheng, Zhengyi Zhou, Yao Yao

TL;DR

UniArt presents a unified diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image, integrating geometry, texture, segmentation, and kinematics into a single latent representation. A reversible joint-to-voxel embedding grounds articulation parameters in voxel geometry, while an open-set articulation formulation removes reliance on predefined joint vocabularies. The method combines a geometry-articulation VAE with a rectified-flow diffusion model to generate coherent shape and motion, achieving state-of-the-art results on PartNet-Mobility and demonstrating strong generalization to unseen categories. Empirically, UniArt delivers superior mesh quality, articulation accuracy, and perceptual alignment, with practical validation in robot simulation and real-world deployment scenarios.

Abstract

Articulated 3D objects play a vital role in realistic simulation and embodied robotics, yet manually constructing such assets remains costly and difficult to scale. In this paper, we present UniArt, a diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image in an end-to-end manner. Unlike prior multi-stage techniques, UniArt establishes a unified latent representation that jointly encodes geometry, texture, part segmentation, and kinematic parameters. We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry, enabling the model to learn coherent motion behaviors alongside structural formation. Furthermore, we formulate articulation type prediction as an open-set problem, removing the need for fixed joint semantics and allowing generalization to novel joint categories and unseen object types. Experiments on the PartNet-Mobility benchmark demonstrate that UniArt achieves state-of-the-art mesh quality and articulation accuracy.

UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation

TL;DR

UniArt presents a unified diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image, integrating geometry, texture, segmentation, and kinematics into a single latent representation. A reversible joint-to-voxel embedding grounds articulation parameters in voxel geometry, while an open-set articulation formulation removes reliance on predefined joint vocabularies. The method combines a geometry-articulation VAE with a rectified-flow diffusion model to generate coherent shape and motion, achieving state-of-the-art results on PartNet-Mobility and demonstrating strong generalization to unseen categories. Empirically, UniArt delivers superior mesh quality, articulation accuracy, and perceptual alignment, with practical validation in robot simulation and real-world deployment scenarios.

Abstract

Articulated 3D objects play a vital role in realistic simulation and embodied robotics, yet manually constructing such assets remains costly and difficult to scale. In this paper, we present UniArt, a diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image in an end-to-end manner. Unlike prior multi-stage techniques, UniArt establishes a unified latent representation that jointly encodes geometry, texture, part segmentation, and kinematic parameters. We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry, enabling the model to learn coherent motion behaviors alongside structural formation. Furthermore, we formulate articulation type prediction as an open-set problem, removing the need for fixed joint semantics and allowing generalization to novel joint categories and unseen object types. Experiments on the PartNet-Mobility benchmark demonstrate that UniArt achieves state-of-the-art mesh quality and articulation accuracy.

Paper Structure

This paper contains 27 sections, 13 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: We propose UniArt, a novel diffusion-based framework that generates robot-ready articulated 3D objects from a single image, enabling open-set generalization for scalable simulation and manipulation.
  • Figure 2: Overview of UniArt. We reformulate the articulated object creation task and introduce UniArt latent representations that jointly encode object geometry, appearance, part segmentation, and articulation parameters within a diffusion-based architecture.
  • Figure 3: Qualitative results of UniArt. Since retrieval-based methods lack appearance information, we randomly applied different colors to distinguish each link. Our method exhibits better consistency in both appearance and geometry, while the results of Singapo liu2024singapo suffer from articulation error (Red Box), geometry inconsistency, and appearance inconsistency.
  • Figure 4: Qualitative results on unseen categories. It can be observed that the articulated objects generated by our method exhibit good consistency with the input images in both appearance and geometry, while previous retrieval-based methods fail to generate sound results.
  • Figure 5: Application in the robotic manipulation.