Table of Contents
Fetching ...

Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

Pengwei Xie, Rui Chen, Siang Chen, Yuzhe Qin, Fanbo Xiang, Tianyu Sun, Jing Xu, Guijin Wang, Hao Su

TL;DR

This work tackles the challenge of manipulating unseen articulated objects using visual feedback without demonstrations. It introduces a part-guided 3D RL framework that fuses 2D segmentation with 3D RL and employs Frame-consistent Uncertainty-aware Sampling to build robust 3D representations, enabling a single policy trained in simulation to generalize to real robots. The method demonstrates strong sim-to-real performance across multiple object categories, and ablation studies show the value of combining segmentation uncertainty with frame-consistency in sampling. The approach offers a scalable path toward versatile, end-to-end articulated-object manipulation with practical impact in real-world robotics.

Abstract

Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. To improve the stability of the policy on real robots, we design a Frame-consistent Uncertainty-aware Sampling (FUS) strategy to get a condensed and hierarchical 3D representation. In addition, a single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation and shows great generalizability to novel categories and instances. Experimental results demonstrate the effectiveness of our framework in both simulation and real-world settings. Our code is available at https://github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation.

Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

TL;DR

This work tackles the challenge of manipulating unseen articulated objects using visual feedback without demonstrations. It introduces a part-guided 3D RL framework that fuses 2D segmentation with 3D RL and employs Frame-consistent Uncertainty-aware Sampling to build robust 3D representations, enabling a single policy trained in simulation to generalize to real robots. The method demonstrates strong sim-to-real performance across multiple object categories, and ablation studies show the value of combining segmentation uncertainty with frame-consistency in sampling. The approach offers a scalable path toward versatile, end-to-end articulated-object manipulation with practical impact in real-world robotics.

Abstract

Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. To improve the stability of the policy on real robots, we design a Frame-consistent Uncertainty-aware Sampling (FUS) strategy to get a condensed and hierarchical 3D representation. In addition, a single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation and shows great generalizability to novel categories and instances. Experimental results demonstrate the effectiveness of our framework in both simulation and real-world settings. Our code is available at https://github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation.
Paper Structure (16 sections, 6 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Our 3D RL framework can be trained for various articulated object manipulation tasks in simulation simultaneously and efficiently. After training, without any demonstrations, the versatile RL policy can be deployed to a real robot and perform different tasks.
  • Figure 2: Framework Overview. 1) We take hand-centric visual observation (i.e., RGB-D images) as input and predict the part segmentation map using a pre-trained segmentation network. 2) 3D part masked points are transformed from the depth image using the camera parameters. 3) Our proposed FUS strategy combines uncertainty and consistency weights to generate per-point weights. These weights are used to sample points for each part. 4) Geometric features extracted using PointNet are combined with the robot states, and fed to the RL algorithm to get the action. After the robot executes the action, a new observation is obtained, and the process iterates from the beginning.
  • Figure 3: We calculate consistency weights from several consecutive frames. Candidate points closer to former sampled ones from the same part are allocated larger sampling weights.
  • Figure 4: Real experimental setup and 3 categories of articulated objects for our experiment.
  • Figure 5: Comparsions of our method with baselines on three single tasks and the HybridTask in simulation. The HybridTask involves all three categories of articulated objects. The results are averaged over 7 random seeds.
  • ...and 5 more figures