ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
Yufei Wang, Ziyu Wang, Mino Nakura, Pratik Bhowal, Chia-Liang Kuo, Yi-Ting Chen, Zackory Erickson, David Held
TL;DR
ArticuBot addresses the challenge of generalizing manipulation to unseen articulated objects by generating a large simulation-based dataset of opening trajectories, then distilling them into a hierarchical, point-cloud–driven policy. The high-level weighted-displacement model grounds sub-goals in the 3D scene, while a low-level diffusion policy handles the actual end-effector motions, conditioned on the planned goals. The approach achieves zero-shot sim2real transfer across multiple robot embodiments and real environments, outperforming prior methods in both simulation and real-world tests. This work advances universal articulated-object manipulation by enabling a single policy to operate across diverse object categories and robot platforms with practical robustness. The combination of large-scale simulation, hierarchical policy design, and diffusion-based low-level control yields a scalable path toward generalizable, real-world robotic manipulation of everyday objects.
Abstract
This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, Articubot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens. Videos and code can be found on our project website: https://articubot.github.io/.
