Table of Contents
Fetching ...

DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands

Fengbo Lan, Shengjie Wang, Yunzhe Zhang, Haotian Xu, Oluwatosin Oseni, Ziye Zhang, Yang Gao, Tao Zhang

TL;DR

Dexterous throwing-catching with dynamic contacts remains hard for robots to master. The authors propose LTC, a model-free RL framework built on PPO that couples a Lyapunov stability-based critic and an intrinsic advantage with perceptual priors from compressed object point clouds, enabling robust throwing and catching across diverse objects and hand postures, including sideways configurations. LTC achieves a 73% success rate across 45 scenarios and demonstrates strong zero-shot generalization to unseen objects, outperforming baselines and showing meaningful generalization to new gestures and object shapes. The work highlights the potential of stability-informed RL for rapid, reliable dynamic manipulation and discusses sim-to-real transfer through domain randomization and perception/pose estimation strategies.

Abstract

Achieving human-like dexterous manipulation remains a crucial area of research in robotics. Current research focuses on improving the success rate of pick-and-place tasks. Compared with pick-and-place, throwing-catching behavior has the potential to increase the speed of transporting objects to their destination. However, dynamic dexterous manipulation poses a major challenge for stable control due to a large number of dynamic contacts. In this paper, we propose a Learning-based framework for Throwing-Catching tasks using dexterous hands (LTC). Our method, LTC, achieves a 73\% success rate across 45 scenarios (diverse hand poses and objects), and the learned policies demonstrate strong zero-shot transfer performance on unseen objects. Additionally, in tasks where the object in hand faces sideways, an extremely unstable scenario due to the lack of support from the palm, all baselines fail, while our method still achieves a success rate of over 60\%.

DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands

TL;DR

Dexterous throwing-catching with dynamic contacts remains hard for robots to master. The authors propose LTC, a model-free RL framework built on PPO that couples a Lyapunov stability-based critic and an intrinsic advantage with perceptual priors from compressed object point clouds, enabling robust throwing and catching across diverse objects and hand postures, including sideways configurations. LTC achieves a 73% success rate across 45 scenarios and demonstrates strong zero-shot generalization to unseen objects, outperforming baselines and showing meaningful generalization to new gestures and object shapes. The work highlights the potential of stability-informed RL for rapid, reliable dynamic manipulation and discusses sim-to-real transfer through domain randomization and perception/pose estimation strategies.

Abstract

Achieving human-like dexterous manipulation remains a crucial area of research in robotics. Current research focuses on improving the success rate of pick-and-place tasks. Compared with pick-and-place, throwing-catching behavior has the potential to increase the speed of transporting objects to their destination. However, dynamic dexterous manipulation poses a major challenge for stable control due to a large number of dynamic contacts. In this paper, we propose a Learning-based framework for Throwing-Catching tasks using dexterous hands (LTC). Our method, LTC, achieves a 73\% success rate across 45 scenarios (diverse hand poses and objects), and the learned policies demonstrate strong zero-shot transfer performance on unseen objects. Additionally, in tasks where the object in hand faces sideways, an extremely unstable scenario due to the lack of support from the palm, all baselines fail, while our method still achieves a success rate of over 60\%.
Paper Structure (28 sections, 1 theorem, 12 equations, 9 figures, 5 tables)

This paper contains 28 sections, 1 theorem, 12 equations, 9 figures, 5 tables.

Key Result

Lemma D.1

In a continuous-time system, the system can be stable within a finite time $T\leq \frac{V_L^{1-\alpha}(s_0)}{k(1-\alpha)}$, if the following condition holds. Note that $s_0$ is the initial state, $\dot{V}_L$ is the Lie derivative $L_f V_L$, $k$ and $\alpha$ are constants ranging from 0 to 1.

Figures (9)

  • Figure 1: Generalization tasks for throwing and catching objects with shadow hands. Shadow hands can throw and catch a variety of objects with different shapes, masses, and other properties when the hands face sideward.
  • Figure 2: A: The method takes as input the environmental observation and the point cloud feature of the object. Then, it learns the catching policy for dexterous hands through an Actor-Critic structure. B: The Lyapunov function, the policy function and the value function are estimated using neural networks. Their network structures are similar, and the only difference is the dimension of the output. For example, the Lyapunov function contains a linear layer, a batch normalization layer, and a penultimate normalization layer, and the last linear layer takes a scalar as output.
  • Figure 3: To enhance the success rate, our approach prioritizes optimizing rewards while ensuring the stability of the grasping gesture. By integrating Lyapunov stability alongside reward optimization, we mitigate the risk of completing tasks in unstable poses. The size of the stable region varies based on task difficulty, with simpler tasks offering a larger stable region.
  • Figure 4: Video snapshots of throwing-catching tasks performed by our method. There exist five tasks with different postures of hands, in order from difficult to easy: Overarm Catch, Overarm2Abreast Catch, Under2Overarm Catch, Abreast Catch, and Underarm Catch. The most challenging task is Overarm Catch because the hands are both oriented vertically. Notably, our method can catch diverse objects in five tasks.
  • Figure 5: The average reward curve for training multiple objects for five different tasks. The first row is the comparison experiment reward graph. The second row is an ablation experiment reward graph. (Curves smoothed report the mean 5 times, and shadow areas show the variance.)
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma D.1: Finite-Time Stability Convergence
  • proof