FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion
Chuhao Chen, Isabella Liu, Xinyue Wei, Hao Su, Minghua Liu
TL;DR
FreeArt3D tackles articulated 3D object generation without task-specific training by repurposing a pretrained static 3D diffusion prior and extending Score Distillation Sampling to the 3D-to-4D setting, treating articulation as an additional generative dimension. It formulates per-object optimization over a two-part geometry (body and movable part), joint parameters, and per-state angles from a small set of articulation-state images, guided by SDS on a posed occupancy grid and refined by Trellis Stage-2 texture and geometry generation. The approach yields high-fidelity, textured meshes with accurate kinematic structures and demonstrates strong generalization across 12 categories, real-world captures, and a multi-joint extension to URDF-like configurations, all with runtimes of minutes per object. This training-free paradigm broadens open-world 3D articulation by leveraging static diffusion priors, offering practical benefits for robotics, AR/VR, and digital twins while highlighting avenues for acceleration and robustness improvements.
Abstract
Articulated 3D objects are central to many applications in robotics, AR/VR, and animation. Recent approaches to modeling such objects either rely on optimization-based reconstruction pipelines that require dense-view supervision or on feed-forward generative models that produce coarse geometric approximations and often overlook surface texture. In contrast, open-world 3D generation of static objects has achieved remarkable success, especially with the advent of native 3D diffusion models such as Trellis. However, extending these methods to articulated objects by training native 3D diffusion models poses significant challenges. In this work, we present FreeArt3D, a training-free framework for articulated 3D object generation. Instead of training a new model on limited articulated data, FreeArt3D repurposes a pre-trained static 3D diffusion model (e.g., Trellis) as a powerful shape prior. It extends Score Distillation Sampling (SDS) into the 3D-to-4D domain by treating articulation as an additional generative dimension. Given a few images captured in different articulation states, FreeArt3D jointly optimizes the object's geometry, texture, and articulation parameters without requiring task-specific training or access to large-scale articulated datasets. Our method generates high-fidelity geometry and textures, accurately predicts underlying kinematic structures, and generalizes well across diverse object categories. Despite following a per-instance optimization paradigm, FreeArt3D completes in minutes and significantly outperforms prior state-of-the-art approaches in both quality and versatility. Please check our website for more details: https://czzzzh.github.io/FreeArt3D
