Table of Contents
Fetching ...

Dexterous Pre-grasp Manipulation for Human-like Functional Categorical Grasping: Deep Reinforcement Learning and Grasp Representations

Dmytro Pavlichenko, Sven Behnke

TL;DR

The paper tackles dexterous pre-grasp manipulation to achieve functional grasps for human-oriented tools using a single, data-driven DRL policy trained from scratch. It introduces two grasp representations—explicit target grasps and constraint-based constraints—and a dense, multi-component reward to guide learning without demonstrations, coupled with a curriculum to speed convergence. Across drills, spray bottles, and mugs, the approach achieves high success on both seen and unseen instances (roughly 94% test accuracy for explicit targets and ~90% for constraint-based targets), with human-like strategies such as repositioning, reorienting, and up-righting the objects learned autonomously. The work discusses limitations for real-world transfer, proposing a concrete three-step path including distillation of privileged observations, occlusion-aware rewards, and sim-to-real fine-tuning to bridge the gap to practical deployment.

Abstract

Many objects, such as tools and household items, can be used only if grasped in a very specific way - grasped functionally. Often, a direct functional grasp is not possible, though. We propose a method for learning a dexterous pre-grasp manipulation policy to achieve human-like functional grasps using deep reinforcement learning. We introduce a dense multi-component reward function that enables learning a single policy, capable of dexterous pre-grasp manipulation of novel instances of several known object categories with an anthropomorphic hand. The policy is learned purely by means of reinforcement learning from scratch, without any expert demonstrations. It implicitly learns to reposition and reorient objects of complex shapes to achieve given functional grasps. In addition, we explore two different ways to represent a desired grasp: explicit and more abstract, constraint-based. We show that our method consistently learns to successfully manipulate and achieve desired grasps on previously unseen object instances of known categories using both grasp representations. Training is completed on a single GPU in under three hours.

Dexterous Pre-grasp Manipulation for Human-like Functional Categorical Grasping: Deep Reinforcement Learning and Grasp Representations

TL;DR

The paper tackles dexterous pre-grasp manipulation to achieve functional grasps for human-oriented tools using a single, data-driven DRL policy trained from scratch. It introduces two grasp representations—explicit target grasps and constraint-based constraints—and a dense, multi-component reward to guide learning without demonstrations, coupled with a curriculum to speed convergence. Across drills, spray bottles, and mugs, the approach achieves high success on both seen and unseen instances (roughly 94% test accuracy for explicit targets and ~90% for constraint-based targets), with human-like strategies such as repositioning, reorienting, and up-righting the objects learned autonomously. The work discusses limitations for real-world transfer, proposing a concrete three-step path including distillation of privileged observations, occlusion-aware rewards, and sim-to-real fine-tuning to bridge the gap to practical deployment.

Abstract

Many objects, such as tools and household items, can be used only if grasped in a very specific way - grasped functionally. Often, a direct functional grasp is not possible, though. We propose a method for learning a dexterous pre-grasp manipulation policy to achieve human-like functional grasps using deep reinforcement learning. We introduce a dense multi-component reward function that enables learning a single policy, capable of dexterous pre-grasp manipulation of novel instances of several known object categories with an anthropomorphic hand. The policy is learned purely by means of reinforcement learning from scratch, without any expert demonstrations. It implicitly learns to reposition and reorient objects of complex shapes to achieve given functional grasps. In addition, we explore two different ways to represent a desired grasp: explicit and more abstract, constraint-based. We show that our method consistently learns to successfully manipulate and achieve desired grasps on previously unseen object instances of known categories using both grasp representations. Training is completed on a single GPU in under three hours.
Paper Structure (19 sections, 24 equations, 12 figures, 2 tables)

This paper contains 19 sections, 24 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Top: Dexterous pre-grasp manipulation that includes reorienting and repositioning a drill. Bottom: Provided only with a target index fingertip position and desired object orientation, our policy learned to utilize a human-like hand to achieve intuitive grasps for three object categories.
  • Figure 2: Composition of the state representation and the reward function. The state consists of information about the hand, the object, and the target functional grasp. The reward function consists of a term encouraging reaching the target grasp, a term encouraging pre-grasp manipulation, and a low manipulability score penalty. "O" denotes object frame of reference.
  • Figure 3: Manipulation reward $r_{\textrm{man}}$ composed of three components: reach, hold, and orient, representing a sequence of interconnected tasks. Equidistant points between the thumb tip and middle fingertip & center, used to query distances to the object, are red, blue, and green crosses. Rewards: plus signs, size is proportional to reward value. First, the motion of the equidistant points to the object surface is rewarded by the reach reward. Second, equidistant points that are inside the object yield a bigger hold reward. Finally, orienting the object towards the nominal orientation yields an even bigger reward. Note that closing the hand brings the equidistant points closer together, often pushing them inside the object. This design implicitly rewards grasping behaviors without using expensive contact information or explicitly rewarding specific movement primitives.
  • Figure 4: Two target grasp representations. Left: Explicit grasp representation, consisting of a 6D end-effector pose and finger positions. Right: Constraint-based representation. The grasp is represented through the index fingertip 3D position and end-effector orientation. This representation allows the policy to explore different grasp configurations, satisfying the constraint.
  • Figure 5: Dataset of 39 objects of three categories: drills, spray bottles, and mugs. Each category has 13 objects: ten for training (gray background) and three for testing (green background).
  • ...and 7 more figures