Table of Contents
Fetching ...

ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation

Yufei Wang, Ziyu Wang, Mino Nakura, Pratik Bhowal, Chia-Liang Kuo, Yi-Ting Chen, Zackory Erickson, David Held

TL;DR

ArticuBot addresses the challenge of generalizing manipulation to unseen articulated objects by generating a large simulation-based dataset of opening trajectories, then distilling them into a hierarchical, point-cloud–driven policy. The high-level weighted-displacement model grounds sub-goals in the 3D scene, while a low-level diffusion policy handles the actual end-effector motions, conditioned on the planned goals. The approach achieves zero-shot sim2real transfer across multiple robot embodiments and real environments, outperforming prior methods in both simulation and real-world tests. This work advances universal articulated-object manipulation by enabling a single policy to operate across diverse object categories and robot platforms with practical robustness. The combination of large-scale simulation, hierarchical policy design, and diffusion-based low-level control yields a scalable path toward generalizable, real-world robotic manipulation of everyday objects.

Abstract

This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, Articubot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens. Videos and code can be found on our project website: https://articubot.github.io/.

ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation

TL;DR

ArticuBot addresses the challenge of generalizing manipulation to unseen articulated objects by generating a large simulation-based dataset of opening trajectories, then distilling them into a hierarchical, point-cloud–driven policy. The high-level weighted-displacement model grounds sub-goals in the 3D scene, while a low-level diffusion policy handles the actual end-effector motions, conditioned on the planned goals. The approach achieves zero-shot sim2real transfer across multiple robot embodiments and real environments, outperforming prior methods in both simulation and real-world tests. This work advances universal articulated-object manipulation by enabling a single policy to operate across diverse object categories and robot platforms with practical robustness. The combination of large-scale simulation, hierarchical policy design, and diffusion-based low-level control yields a scalable path toward generalizable, real-world robotic manipulation of everyday objects.

Abstract

This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, Articubot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens. Videos and code can be found on our project website: https://articubot.github.io/.

Paper Structure

This paper contains 43 sections, 4 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: System overview of ArticuBot. Top: We combine sampling-based grasping, motion planning, and opening actions to efficiently generate thousands of demonstrations in simulation. These demonstrations are distilled into a hierarchical policy via imitation learning, and then zero-shot transferred to real world. Middle: We propose a weighted displacement model for the high-level policy, which predicts the sub-goal end-effector pose. The weighted displacement model predicts the displacement from each point in the point cloud observation to the sub-goal end-effector, as well as a weight for each point. The final prediction is the weighted average of each point's prediction. Bottom: We propose a goal-conditioned 3D diffusion policy for the low-level policy, which first applies attention between the current end-effector points, the scene points, and the goal end-effector points to obtain a latent embedding, and then performs diffusion on the latent embedding to generate the action, which is the delta transformation of the robot end-effector.
  • Figure 2: Comparison of hierarchical and non-hierarchical policies.
  • Figure 3: Comparison of different high-level policies. Leftmost: Train and test without camera randomizations. Right: Train with camera randomizations, and test with no camera randomization, with camera randomizations from training distribution, and with camera randomizations from an unseen test distribution.
  • Figure 4: The three different real robot setups.
  • Figure 5: Real-world test objects for table-top and mobile-base experiments.
  • ...and 13 more figures