Table of Contents
Fetching ...

Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment

Wenbin Bai, Qiyu Chen, Xiangbo Lin, Jianwen Li, Quancheng Li, Hejiang Pan, Yi Sun

TL;DR

Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment (PKDA) presents a hand-agnostic framework that converts human manipulation videos into high-quality dexterous-hand trajectories with minimal data. It combines four modules—Interaction Perceptor, Trajectory Proposer, ContactAdapt Optimizer, and Wrist Trajectory Planner—performing progressive kinematic mapping and residual RL to optimize hand-object contacts while preserving manipulation semantics. The approach demonstrates strong generalization across Adroit, Allegro, and Leap hands, achieving an average transfer success rate of $73\%$, and shows improved transfer efficiency over state-of-the-art baselines. Real-world tests on a UR10 with Leap Hand validate practical feasibility through open-loop execution of simulation-derived trajectories for common tasks like Shake, Pour, and Stamp, highlighting PKDA’s potential for scalable dexterous manipulation data generation.

Abstract

The inherent difficulty and limited scalability of collecting manipulation data using multi-fingered robot hand hardware platforms have resulted in severe data scarcity, impeding research on data-driven dexterous manipulation policy learning. To address this challenge, we present a hand-agnostic manipulation transfer system. It efficiently converts human hand manipulation sequences from demonstration videos into high-quality dexterous manipulation trajectories without requirements of massive training data. To tackle the multi-dimensional disparities between human hands and dexterous hands, as well as the challenges posed by high-degree-of-freedom coordinated control of dexterous hands, we design a progressive transfer framework: first, we establish primary control signals for dexterous hands based on kinematic matching; subsequently, we train residual policies with action space rescaling and thumb-guided initialization to dynamically optimize contact interactions under unified rewards; finally, we compute wrist control trajectories with the objective of preserving operational semantics. Using only human hand manipulation videos, our system automatically configures system parameters for different tasks, balancing kinematic matching and dynamic optimization across dexterous hands, object categories, and tasks. Extensive experimental results demonstrate that our framework can automatically generate smooth and semantically correct dexterous hand manipulation that faithfully reproduces human intentions, achieving high efficiency and strong generalizability with an average transfer success rate of 73%, providing an easily implementable and scalable method for collecting robot dexterous manipulation data.

Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment

TL;DR

Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment (PKDA) presents a hand-agnostic framework that converts human manipulation videos into high-quality dexterous-hand trajectories with minimal data. It combines four modules—Interaction Perceptor, Trajectory Proposer, ContactAdapt Optimizer, and Wrist Trajectory Planner—performing progressive kinematic mapping and residual RL to optimize hand-object contacts while preserving manipulation semantics. The approach demonstrates strong generalization across Adroit, Allegro, and Leap hands, achieving an average transfer success rate of , and shows improved transfer efficiency over state-of-the-art baselines. Real-world tests on a UR10 with Leap Hand validate practical feasibility through open-loop execution of simulation-derived trajectories for common tasks like Shake, Pour, and Stamp, highlighting PKDA’s potential for scalable dexterous manipulation data generation.

Abstract

The inherent difficulty and limited scalability of collecting manipulation data using multi-fingered robot hand hardware platforms have resulted in severe data scarcity, impeding research on data-driven dexterous manipulation policy learning. To address this challenge, we present a hand-agnostic manipulation transfer system. It efficiently converts human hand manipulation sequences from demonstration videos into high-quality dexterous manipulation trajectories without requirements of massive training data. To tackle the multi-dimensional disparities between human hands and dexterous hands, as well as the challenges posed by high-degree-of-freedom coordinated control of dexterous hands, we design a progressive transfer framework: first, we establish primary control signals for dexterous hands based on kinematic matching; subsequently, we train residual policies with action space rescaling and thumb-guided initialization to dynamically optimize contact interactions under unified rewards; finally, we compute wrist control trajectories with the objective of preserving operational semantics. Using only human hand manipulation videos, our system automatically configures system parameters for different tasks, balancing kinematic matching and dynamic optimization across dexterous hands, object categories, and tasks. Extensive experimental results demonstrate that our framework can automatically generate smooth and semantically correct dexterous hand manipulation that faithfully reproduces human intentions, achieving high efficiency and strong generalizability with an average transfer success rate of 73%, providing an easily implementable and scalable method for collecting robot dexterous manipulation data.

Paper Structure

This paper contains 33 sections, 13 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: PKDA Overview: Starting from human video demonstrations, human manipulation is mapped to various dexterous hands. The system achieves perception-control integration for dexterous manipulation transfer by addressing four core questions: “what to grasp, what to do, how to grasp, and how to manipulate.”
  • Figure 2: The PKDA system comprises four modules. Interaction Perceptor extracts key manipulation cues, including human hand posture H, object pose O, contact point C, and object mesh, from human demonstration videos. Trajectory Proposer retargets human hand movements into dexterous hand joint angle sequences, generating a primary control signal $A_{primary}$ to guide the primary trajectory. However, lacking dynamic adjustment, this trajectory often leads to failed grasps (see red box). To improve grasp stability, the ContactAdapt Optimizer employs RL, where a residual policy modifies $A_{primary}$ through $\Delta a_{t}$. Wrist Trajectory Planner combines object motion with relative hand-object constraints to synthesize wrist trajectories and generate the complete control signal.
  • Figure 3: RL-Configurator (left) standardizes diverse tasks by configuring pre-grasp and object goal state. The Action Space Rescaling module (right) compresses wrist motion space into the neighborhood of the pre-grasp to enable efficient hand-object interaction exploration.
  • Figure 4: Learning efficiency comparison on the TCDM task using the Adroit Hand.
  • Figure 5: Comparison of the transfer quality on repetitive task of knocking a nail.
  • ...and 7 more figures