Table of Contents
Fetching ...

DexPoint: Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation

Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Hao Su, Xiaolong Wang

TL;DR

DexPoint addresses generalization in dexterous manipulation under sim-to-real transfer by training a point-cloud policy for a multi-finger hand. It combines observed and imagined hand point clouds with a contact-based reward to enable category-level generalization to unseen objects and robust real-world deployment without real-world data. Empirical results with an Allegro Hand on XArm6 demonstrate successful sim-to-real transfer for grasping and door opening, with multi-object training improving generalization and outperforming a model-based baseline that requires object models. The work advances practical dexterous manipulation by leveraging geometry-focused sensing and reward design to bridge the sim-to-real gap.

Abstract

We propose a sim-to-real framework for dexterous manipulation which can generalize to new objects of the same category in the real world. The key of our framework is to train the manipulation policy with point cloud inputs and dexterous hands. We propose two new techniques to enable joint learning on multiple objects and sim-to-real generalization: (i) using imagined hand point clouds as augmented inputs; and (ii) designing novel contact-based rewards. We empirically evaluate our method using an Allegro Hand to grasp novel objects in both simulation and real world. To the best of our knowledge, this is the first policy learning-based framework that achieves such generalization results with dexterous hands. Our project page is available at https://yzqin.github.io/dexpoint

DexPoint: Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation

TL;DR

DexPoint addresses generalization in dexterous manipulation under sim-to-real transfer by training a point-cloud policy for a multi-finger hand. It combines observed and imagined hand point clouds with a contact-based reward to enable category-level generalization to unseen objects and robust real-world deployment without real-world data. Empirical results with an Allegro Hand on XArm6 demonstrate successful sim-to-real transfer for grasping and door opening, with multi-object training improving generalization and outperforming a model-based baseline that requires object models. The work advances practical dexterous manipulation by leveraging geometry-focused sensing and reward design to bridge the sim-to-real gap.

Abstract

We propose a sim-to-real framework for dexterous manipulation which can generalize to new objects of the same category in the real world. The key of our framework is to train the manipulation policy with point cloud inputs and dexterous hands. We propose two new techniques to enable joint learning on multiple objects and sim-to-real generalization: (i) using imagined hand point clouds as augmented inputs; and (ii) designing novel contact-based rewards. We empirically evaluate our method using an Allegro Hand to grasp novel objects in both simulation and real world. To the best of our knowledge, this is the first policy learning-based framework that achieves such generalization results with dexterous hands. Our project page is available at https://yzqin.github.io/dexpoint
Paper Structure (14 sections, 5 equations, 5 figures, 6 tables)

This paper contains 14 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: We introduce a reinforcement learning method which takes the point cloud as input for two manipulation tasks: grasping and door opening. By introducing several techniques in the policy learning process, our point cloud based policy trained purely in simulation can successfully generalize to novel objects and transfer to real-world without any real-world data.
  • Figure 2: Real-experiment Setup: we use an Allegro Hand attached on an XArm6 and a RealSense D435 camera facing forward the robot.
  • Figure 3: Architecture: our feature extractor takes the observed point cloud, imagined point cloud, robot proprioception, and goal pose as input to output a feature embedding. Both actor and critic take the same feature to predict action and value. The red point represented the imaged point cloud of robot hand. Note that our network does not require RGB information.
  • Figure 4: Training Curves. The left two plots show the single-object and multi-object training curve of (a) bottle category and (b) can category. The right three plots show the ablation results on the (c) grasping bottle (d) grasping can and (e) door opening. The x-axis is the training iterations and y-axis is the normalized episodic return. The shaded area indicates standard error and the performance is evaluated on five random seeds.
  • Figure 5: Real-experiment: We evaluate our point cloud policy on various unseen objects.