Table of Contents
Fetching ...

URDF-Anything+: Autoregressive Articulated 3D Models Generation for Physical Simulation

Zhuangzhe Wu, Yue Xin, Chengkai Hou, Minghao Chen, Yaoxu Lyu, Jieyu Zhang, Shanghang Zhang

Abstract

Articulated objects are fundamental for robotics, simulation of physics, and interactive virtual environments. However, reconstructing them from visual input remains challenging, as it requires jointly inferring both part geometry and kinematic structure. We present, an end-to-end autoregressive framework that directly generates executable articulated object models from visual observations. Given image and object-level 3D cues, our method sequentially produces part geometries and their associated joint parameters, resulting in complete URDF models without reliance on multi-stage pipelines. The generation proceeds until the model determines that all parts have been produced, automatically inferring complete geometry and kinematics. Building on this capability, we enable a new Real-Follow-Sim paradigm, where high-fidelity digital twins constructed from visual observations allow policies trained and tested purely in simulation to transfer to real robots without online adaptation. Experiments on large-scale articulated object benchmarks and real-world robotic tasks demonstrate that outperforms prior methods in geometric reconstruction quality, joint parameter accuracy, and physical executability.

URDF-Anything+: Autoregressive Articulated 3D Models Generation for Physical Simulation

Abstract

Articulated objects are fundamental for robotics, simulation of physics, and interactive virtual environments. However, reconstructing them from visual input remains challenging, as it requires jointly inferring both part geometry and kinematic structure. We present, an end-to-end autoregressive framework that directly generates executable articulated object models from visual observations. Given image and object-level 3D cues, our method sequentially produces part geometries and their associated joint parameters, resulting in complete URDF models without reliance on multi-stage pipelines. The generation proceeds until the model determines that all parts have been produced, automatically inferring complete geometry and kinematics. Building on this capability, we enable a new Real-Follow-Sim paradigm, where high-fidelity digital twins constructed from visual observations allow policies trained and tested purely in simulation to transfer to real robots without online adaptation. Experiments on large-scale articulated object benchmarks and real-world robotic tasks demonstrate that outperforms prior methods in geometric reconstruction quality, joint parameter accuracy, and physical executability.
Paper Structure (50 sections, 13 equations, 14 figures, 4 tables)

This paper contains 50 sections, 13 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Overview of the URDF-Anything Generation Pipeline. It shows how we autoregressively generate the 3D components of an object along with their spatial configurations using a DiT model, and then assemble them into a complete URDF file.
  • Figure 2: Structure of DiT and parameter decoders.
  • Figure 3: Pipeline for Real-to-Sim Digital Twin Construction. Our system captures RGB-D data from the real world, generates articulated URDF models using URDF-Anything+, aligns them with observed geometry via ICP, scales them to physical size, and instantiates them in Isaac Sim to create a spatially accurate digital twin.
  • Figure 4: Qualitative Results on the Test Set of Our Dataset. Compared with other methods, URDF-Anything+ generates high-quality 3D assets with more accurate geometry and articulation.
  • Figure 5: Qualitative Results on Tn-the-wild Images. Compared with other methods, URDF-Anything+ generates high-quality 3D assets with more accurate geometry and articulation.
  • ...and 9 more figures