Table of Contents
Fetching ...

FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion

Chuhao Chen, Isabella Liu, Xinyue Wei, Hao Su, Minghua Liu

TL;DR

FreeArt3D tackles articulated 3D object generation without task-specific training by repurposing a pretrained static 3D diffusion prior and extending Score Distillation Sampling to the 3D-to-4D setting, treating articulation as an additional generative dimension. It formulates per-object optimization over a two-part geometry (body and movable part), joint parameters, and per-state angles from a small set of articulation-state images, guided by SDS on a posed occupancy grid and refined by Trellis Stage-2 texture and geometry generation. The approach yields high-fidelity, textured meshes with accurate kinematic structures and demonstrates strong generalization across 12 categories, real-world captures, and a multi-joint extension to URDF-like configurations, all with runtimes of minutes per object. This training-free paradigm broadens open-world 3D articulation by leveraging static diffusion priors, offering practical benefits for robotics, AR/VR, and digital twins while highlighting avenues for acceleration and robustness improvements.

Abstract

Articulated 3D objects are central to many applications in robotics, AR/VR, and animation. Recent approaches to modeling such objects either rely on optimization-based reconstruction pipelines that require dense-view supervision or on feed-forward generative models that produce coarse geometric approximations and often overlook surface texture. In contrast, open-world 3D generation of static objects has achieved remarkable success, especially with the advent of native 3D diffusion models such as Trellis. However, extending these methods to articulated objects by training native 3D diffusion models poses significant challenges. In this work, we present FreeArt3D, a training-free framework for articulated 3D object generation. Instead of training a new model on limited articulated data, FreeArt3D repurposes a pre-trained static 3D diffusion model (e.g., Trellis) as a powerful shape prior. It extends Score Distillation Sampling (SDS) into the 3D-to-4D domain by treating articulation as an additional generative dimension. Given a few images captured in different articulation states, FreeArt3D jointly optimizes the object's geometry, texture, and articulation parameters without requiring task-specific training or access to large-scale articulated datasets. Our method generates high-fidelity geometry and textures, accurately predicts underlying kinematic structures, and generalizes well across diverse object categories. Despite following a per-instance optimization paradigm, FreeArt3D completes in minutes and significantly outperforms prior state-of-the-art approaches in both quality and versatility. Please check our website for more details: https://czzzzh.github.io/FreeArt3D

FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion

TL;DR

FreeArt3D tackles articulated 3D object generation without task-specific training by repurposing a pretrained static 3D diffusion prior and extending Score Distillation Sampling to the 3D-to-4D setting, treating articulation as an additional generative dimension. It formulates per-object optimization over a two-part geometry (body and movable part), joint parameters, and per-state angles from a small set of articulation-state images, guided by SDS on a posed occupancy grid and refined by Trellis Stage-2 texture and geometry generation. The approach yields high-fidelity, textured meshes with accurate kinematic structures and demonstrates strong generalization across 12 categories, real-world captures, and a multi-joint extension to URDF-like configurations, all with runtimes of minutes per object. This training-free paradigm broadens open-world 3D articulation by leveraging static diffusion priors, offering practical benefits for robotics, AR/VR, and digital twins while highlighting avenues for acceleration and robustness improvements.

Abstract

Articulated 3D objects are central to many applications in robotics, AR/VR, and animation. Recent approaches to modeling such objects either rely on optimization-based reconstruction pipelines that require dense-view supervision or on feed-forward generative models that produce coarse geometric approximations and often overlook surface texture. In contrast, open-world 3D generation of static objects has achieved remarkable success, especially with the advent of native 3D diffusion models such as Trellis. However, extending these methods to articulated objects by training native 3D diffusion models poses significant challenges. In this work, we present FreeArt3D, a training-free framework for articulated 3D object generation. Instead of training a new model on limited articulated data, FreeArt3D repurposes a pre-trained static 3D diffusion model (e.g., Trellis) as a powerful shape prior. It extends Score Distillation Sampling (SDS) into the 3D-to-4D domain by treating articulation as an additional generative dimension. Given a few images captured in different articulation states, FreeArt3D jointly optimizes the object's geometry, texture, and articulation parameters without requiring task-specific training or access to large-scale articulated datasets. Our method generates high-fidelity geometry and textures, accurately predicts underlying kinematic structures, and generalizes well across diverse object categories. Despite following a per-instance optimization paradigm, FreeArt3D completes in minutes and significantly outperforms prior state-of-the-art approaches in both quality and versatility. Please check our website for more details: https://czzzzh.github.io/FreeArt3D

Paper Structure

This paper contains 18 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: FreeArt3D employs a per-shape optimization strategy. Given sparse-view images of different joint states as input, we jointly optimize two separate geometries—one for the body and one for the movable part—the joint parameters $\mathcal{J}$ (e.g., joint axis, pivot point), and optionally the joint states $\theta_k$, if they are not provided. During the coarse geometry and joint optimization stage, we sample an image $\mathbf{I}_k$ corresponding to joint state $\theta_k$ at each iteration and aim to construct an occupancy grid of the full object under this configuration. To achieve this, we query two hash grids and transform the coordinates according to the current joint parameters $\mathcal{J}$ and joint state $\theta_k$. The merged occupancy grid is then passed to a pretrained 3D diffusion model, Trellis xiang2024structured, along with the image $\mathbf{I}_k$ to provide gradient guidance for optimization. After completing the coarse stage, we clean the merged, optimized voxels and input them into the pretrained second-stage diffusion and VAE models to generate fine-grained geometry and realistic textures.
  • Figure 2: Trellis inference results across different joint states. Since Trellis is trained on 3D data normalized to a unit cube, each articulated component (e.g., the body of a stapler or a desk) may appear at different scales across joint states, depending on how far the movable parts (e.g., the stapler handle or desk drawer) extend. This scale inconsistency hinders optimization convergence. To address this issue, we introduce a disk beneath the object that serves as a reference to support the entire cube, ensuring consistent component scales across different states.
  • Figure 3: Comparison between Singapo liu2024singapo, Articulate-Anything le2024articulate, and Ours. Unlike baseline methods that rely on part retrieval and fail to reconstruct detailed geometry and textures, our method generates meshes that closely match the input images and successfully recover fine-grained geometric details, realistic textures, and accurate articulation structures.
  • Figure 4: Real-World Demo. For each shape, we capture six images of the object in different joint states. FreeArt3D effectively leverages these casually captured, unposed images to generate high-quality articulated objects with sharp geometry and realistic textures.
  • Figure 5: Generation Results of Multiple Parts and Joints. Our method can be easily extended to support the generation of multiple articulated parts and joints, enabling flexible configuration of all components in the generated objects.
  • ...and 1 more figures