Part-Guided 3D RL for Sim2Real Articulated Object Manipulation
Pengwei Xie, Rui Chen, Siang Chen, Yuzhe Qin, Fanbo Xiang, Tianyu Sun, Jing Xu, Guijin Wang, Hao Su
TL;DR
This work tackles the challenge of manipulating unseen articulated objects using visual feedback without demonstrations. It introduces a part-guided 3D RL framework that fuses 2D segmentation with 3D RL and employs Frame-consistent Uncertainty-aware Sampling to build robust 3D representations, enabling a single policy trained in simulation to generalize to real robots. The method demonstrates strong sim-to-real performance across multiple object categories, and ablation studies show the value of combining segmentation uncertainty with frame-consistency in sampling. The approach offers a scalable path toward versatile, end-to-end articulated-object manipulation with practical impact in real-world robotics.
Abstract
Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. To improve the stability of the policy on real robots, we design a Frame-consistent Uncertainty-aware Sampling (FUS) strategy to get a condensed and hierarchical 3D representation. In addition, a single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation and shows great generalizability to novel categories and instances. Experimental results demonstrate the effectiveness of our framework in both simulation and real-world settings. Our code is available at https://github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation.
