ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes

Yixuan Yang; Luyang Xie; Zhen Luo; Zixiang Zhao; Tongsheng Ding; Mingqi Gao; Feng Zheng

ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes

Yixuan Yang, Luyang Xie, Zhen Luo, Zixiang Zhao, Tongsheng Ding, Mingqi Gao, Feng Zheng

TL;DR

ArtiWorld introduces a scene-aware pipeline that automatically identifies articulable objects in 3D scenes and converts rigid assets into executable URDF-based articulated objects, preserving original geometry. The core Arti4URDF model embeds 3D point-cloud geometry into a large language model to infer inter-part relations, joint types, and kinematic parameters, generating both a JSON-style kinematic tree and a complete URDF. Trained on PartNet-Mobility and PhysXNet, and evaluated on object-, scene-, and real-world scans, ArtiWorld achieves state-of-the-art joint-type prediction and axis localization, with strong generalization to unseen categories and real-world data. The approach enables interactive, robot-ready simulation environments directly from existing 3D assets, facilitating scalable robot learning and data augmentation.

Abstract

Building interactive simulators and scalable robot-learning environments requires a large number of articulated assets. However, most existing 3D assets in simulation are rigid, and manually converting them into articulated objects is extremely labor- and cost-intensive. This raises a natural question: can we automatically identify articulable objects in a scene and convert them into articulated assets directly? In this paper, we present ArtiWorld, a scene-aware pipeline that localizes candidate articulable objects from textual scene descriptions and reconstructs executable URDF models that preserve the original geometry. At the core of this pipeline is Arti4URDF, which leverages 3D point cloud, prior knowledge of a large language model (LLM), and a URDF-oriented prompt design to rapidly convert rigid objects into interactive URDF-based articulated objects while maintaining their 3D shape. We evaluate ArtiWorld at three levels: 3D simulated objects, full 3D simulated scenes, and real-world scan scenes. Across all three settings, our method consistently outperforms existing approaches and achieves state-of-the-art performance, while preserving object geometry and correctly capturing object interactivity to produce usable URDF-based articulated models. This provides a practical path toward building interactive, robot-ready simulation environments directly from existing 3D assets. Code and data will be released.

ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes

TL;DR

Abstract

ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)