UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation
Bu Jin, Weize Li, Songen Gu, Yupeng Zheng, Yuhang Zheng, Zhengyi Zhou, Yao Yao
TL;DR
UniArt presents a unified diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image, integrating geometry, texture, segmentation, and kinematics into a single latent representation. A reversible joint-to-voxel embedding grounds articulation parameters in voxel geometry, while an open-set articulation formulation removes reliance on predefined joint vocabularies. The method combines a geometry-articulation VAE with a rectified-flow diffusion model to generate coherent shape and motion, achieving state-of-the-art results on PartNet-Mobility and demonstrating strong generalization to unseen categories. Empirically, UniArt delivers superior mesh quality, articulation accuracy, and perceptual alignment, with practical validation in robot simulation and real-world deployment scenarios.
Abstract
Articulated 3D objects play a vital role in realistic simulation and embodied robotics, yet manually constructing such assets remains costly and difficult to scale. In this paper, we present UniArt, a diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image in an end-to-end manner. Unlike prior multi-stage techniques, UniArt establishes a unified latent representation that jointly encodes geometry, texture, part segmentation, and kinematic parameters. We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry, enabling the model to learn coherent motion behaviors alongside structural formation. Furthermore, we formulate articulation type prediction as an open-set problem, removing the need for fixed joint semantics and allowing generalization to novel joint categories and unseen object types. Experiments on the PartNet-Mobility benchmark demonstrate that UniArt achieves state-of-the-art mesh quality and articulation accuracy.
