Table of Contents
Fetching ...

Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control

Ho Jae Lee, Yonghyeon Lee, Alexander Alexiev, Tzu-Yuan Lin, Se Hwan Jeon, Sangbae Kim

Abstract

In this work, we propose a hybrid hierarchical control framework for reactive dexterous grasping that explicitly decouples high-level spatial intent from low-level joint execution. We introduce a multi-agent reinforcement learning architecture, specialized into distinct arm and hand agents, that acts as a high-level planner by generating desired task-space velocity commands. These commands are then processed by a GPU-parallelized quadratic programming controller, which translates them into feasible joint velocities while strictly enforcing kinematic limits and collision avoidance. This structural isolation not only accelerates training convergence but also strictly enforces hardware safety. Furthermore, the architecture unlocks zero-shot steerability, allowing system operators to dynamically adjust safety margins and avoid dynamic obstacles without retraining the policy. We extensively validate the proposed framework through a rigorous simulation-to-reality pipeline. Real-world hardware experiments on a 7-DoF arm equipped with a 20-DoF anthropomorphic hand demonstrate highly robust zero-shot transferability for dexterous grasping to a diverse set of unseen objects, highlighting the system's ability to reactively recover from unexpected physical disturbances in unstructured environments.

Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control

Abstract

In this work, we propose a hybrid hierarchical control framework for reactive dexterous grasping that explicitly decouples high-level spatial intent from low-level joint execution. We introduce a multi-agent reinforcement learning architecture, specialized into distinct arm and hand agents, that acts as a high-level planner by generating desired task-space velocity commands. These commands are then processed by a GPU-parallelized quadratic programming controller, which translates them into feasible joint velocities while strictly enforcing kinematic limits and collision avoidance. This structural isolation not only accelerates training convergence but also strictly enforces hardware safety. Furthermore, the architecture unlocks zero-shot steerability, allowing system operators to dynamically adjust safety margins and avoid dynamic obstacles without retraining the policy. We extensively validate the proposed framework through a rigorous simulation-to-reality pipeline. Real-world hardware experiments on a 7-DoF arm equipped with a 20-DoF anthropomorphic hand demonstrate highly robust zero-shot transferability for dexterous grasping to a diverse set of unseen objects, highlighting the system's ability to reactively recover from unexpected physical disturbances in unstructured environments.

Paper Structure

This paper contains 29 sections, 24 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Overview of the proposed multi-agent RL framework with hardware validation. Our manipulator platform, equipped with a five-finger anthropomorphic hand, successfully grasps and lifts a target object. The background overlay illustrates the policy decomposition. The arm agent $\pi_{\text{arm}}$ (red) commands the palm twist (red and orange arrows), while the hand agent $\pi_{\text{hand}}$ (blue) independently commands the local fingertip linear velocities (light blue arrows). The bottom row highlights successful grasps on arbitrary object shapes and deformables.
  • Figure 2: Overview of the hybrid hierarchical control framework. Our architecture consists of a high-level RL planner operating at 100 Hz and a low-level QP controller running at 500 Hz. The planner is decomposed into two specialized agents: (i) an arm agent that generates a global transport strategy via a desired palm twist ${}^{W}\boldsymbol{\mathcal{V}}_{P, \text{des}}$ in the world frame, and (ii) a hand agent that generates local grasping strategy via desired fingertip linear velocities ${}^{P}\mathbf{v}_{F, \text{des}/P}$ relative to the palm. This decomposition enables a unified global-to-local manipulation strategy. These task-space velocities are processed by a joint-space constrained QP controller to compute optimal joint velocities $\dot{\mathbf{q}}_{\text{des}}$ while strictly enforcing physical safety and hardware constraints. This hierarchy allows the policies to focus on learning high-level strategy by delegating physical feasibility and complex differential kinematics to the low-level controller.
  • Figure 3: Illustration of the fixed-based manipulation platforms and coordinate frames. Left: Two-fingered gripper. Right: Five-fingered anthropomorphic hand. Zoomed views show the local palm frame $P$ used for fingertip velocity mapping and the computation of hand-centric local observations. The target object is modeled as a minimum volume bounding box (green), enabling the framework to generalize across various arbitrary object shapes.
  • Figure 4: Simulation results demonstrating the reach-grasp-lift progression of the proposed framework. The top two rows show the 20-DoF 5F hand grasping (a) a toy airplane (ID: 072a) and (b) a pudding box (ID: 008). The bottom two rows show the 8-DoF 2F gripper grasping (c) a baseball (ID: 055) and (d) a pitcher base (ID: 019). Our proposed control framework successfully generalizes diverse grasping strategies to various geometries across various end-effector platforms, demonstrating its robustness and adaptability in dexterous grasping tasks.
  • Figure 5: Evolution of the arm and hand policy action outputs over time corresponding to the toy airplane execution sequence shown in \ref{['fig:sim_results']}(a). Phase transitions are intrinsically driven by the decreasing palm-to-object distance (bottom). During reaching, the palm rapidly approaches the object while the fingers remain relatively passive. During grasping, the palm linear velocity drops for precise positioning while the fingertip velocities peak to actively enclose the object. In the lifting stage, the palm translates to the desired pose while the fingers maintain continuous velocity commands to stabilize the grasp.
  • ...and 10 more figures