Table of Contents
Fetching ...

SPIDER: Scalable Physics-Informed Dexterous Retargeting

Chaoyi Pan, Changhao Wang, Haozhi Qi, Zixi Liu, Homanga Bharadhwaj, Akash Sharma, Tingfan Wu, Guanya Shi, Jitendra Malik, Francois Hogan

TL;DR

SPIDER tackles the data scarcity barrier in dexterous and humanoid robot control by converting abundant human motion data into dynamically feasible robot trajectories through a scalable physics-informed retargeting framework. It blends sampling-based optimization with virtual contact guidance and trajectory robustification, enabling cross-embodiment retargeting across nine robot morphologies and six datasets, and yielding a 2.4M-frame dynamic-feasible dataset for policy learning. Key contributions include a formal physics-based retargeting problem, an annealed sampling solver, a curriculum-like virtual contact mechanism, robustness to sim-to-real gaps, and diverse physics-aware data augmentation. The approach delivers high success rates and speed advantages over RL baselines, supports open-world data quality via single-RGB-to-trajectory pipelines, and demonstrates practical deployment and policy-learning benefits, suggesting broad impact for scalable robot learning and sim-to-real pipelines.

Abstract

Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, abundant human motion data is readily available from motion capture, videos, and virtual reality, which could help address the data scarcity problem. However, due to the embodiment gap and missing dynamic information like force and torque, these demonstrations cannot be directly executed on robots. To bridge this gap, we propose Scalable Physics-Informed DExterous Retargeting (SPIDER), a physics-based retargeting framework to transform and augment kinematic-only human demonstrations to dynamically feasible robot trajectories at scale. Our key insight is that human demonstrations should provide global task structure and objective, while large-scale physics-based sampling with curriculum-style virtual contact guidance should refine trajectories to ensure dynamical feasibility and correct contact sequences. SPIDER scales across diverse 9 humanoid/dexterous hand embodiments and 6 datasets, improving success rates by 18% compared to standard sampling, while being 10X faster than reinforcement learning (RL) baselines, and enabling the generation of a 2.4M frames dynamic-feasible robot dataset for policy learning. As a universal physics-based retargeting method, SPIDER can work with diverse quality data and generate diverse and high-quality data to enable efficient policy learning with methods like RL.

SPIDER: Scalable Physics-Informed Dexterous Retargeting

TL;DR

SPIDER tackles the data scarcity barrier in dexterous and humanoid robot control by converting abundant human motion data into dynamically feasible robot trajectories through a scalable physics-informed retargeting framework. It blends sampling-based optimization with virtual contact guidance and trajectory robustification, enabling cross-embodiment retargeting across nine robot morphologies and six datasets, and yielding a 2.4M-frame dynamic-feasible dataset for policy learning. Key contributions include a formal physics-based retargeting problem, an annealed sampling solver, a curriculum-like virtual contact mechanism, robustness to sim-to-real gaps, and diverse physics-aware data augmentation. The approach delivers high success rates and speed advantages over RL baselines, supports open-world data quality via single-RGB-to-trajectory pipelines, and demonstrates practical deployment and policy-learning benefits, suggesting broad impact for scalable robot learning and sim-to-real pipelines.

Abstract

Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, abundant human motion data is readily available from motion capture, videos, and virtual reality, which could help address the data scarcity problem. However, due to the embodiment gap and missing dynamic information like force and torque, these demonstrations cannot be directly executed on robots. To bridge this gap, we propose Scalable Physics-Informed DExterous Retargeting (SPIDER), a physics-based retargeting framework to transform and augment kinematic-only human demonstrations to dynamically feasible robot trajectories at scale. Our key insight is that human demonstrations should provide global task structure and objective, while large-scale physics-based sampling with curriculum-style virtual contact guidance should refine trajectories to ensure dynamical feasibility and correct contact sequences. SPIDER scales across diverse 9 humanoid/dexterous hand embodiments and 6 datasets, improving success rates by 18% compared to standard sampling, while being 10X faster than reinforcement learning (RL) baselines, and enabling the generation of a 2.4M frames dynamic-feasible robot dataset for policy learning. As a universal physics-based retargeting method, SPIDER can work with diverse quality data and generate diverse and high-quality data to enable efficient policy learning with methods like RL.

Paper Structure

This paper contains 24 sections, 5 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of SPIDER method. SPIDER converts human-object interaction trajectories to dynamically feasible robot-object interaction trajectories using sampling with parallel physics simulator. We introduce an additional virtual contact guidance method to minimize the solution ambiguity in contact-rich tasks. With the combination of the two, SPIDER converts human dataset to deployable robot data at scale and supports multiple distinct robot embodiments and task domains.
  • Figure 2: Overview of the SPIDER pipeline. The pipeline takes reconstructed object meshes, reference robot motion, and object motion and converts them into a dynamically feasible robot trajectory with corrected contacts. The generated trajectory is further robustified and augmented before deployment or policy learning.
  • Figure 3: Contact mode mismatch in sampling and virtual guidance method to correct it. (a) Given the same task, the robot can hold the object in different contact modes while still finishing the task. However, the contact mode from human is preferred. (b) Given different sampling methods when seeking a feasible motion with correct contact: Standard sampling (left): uses a fixed search radius, leading to high variance. The resulting solution fails to converge well. Annealed sampling (middle): gradually shrinks the search radius, starting coarse and narrowing down to a finer solution, but may drift toward a feasible solution with wrong contact. Annealed sampling with virtual contact guidance (right): expands the feasible region by adding virtual guidance near target contacts. This enlarges the feasible region to a relaxed feasible set, steering sampling away from undesired feasible solutions and towards the intended contact sequence.
  • Figure 4: Physics-based data augmentation. We augment the retargeted data from a single demonstration into a diverse set of physically feasible actions. Here we demonstrate (a) generating motion with new object mesh for dexterous manipulation, (b) moving a lighter and smaller object for humanoid robot tasks, (c) adding stairs to the scene for humanoid running motion. (d) applying external forces to the robot when it is pulling a heavy object.
  • Figure 5: Specifications of robots used in evaluation. SPIDER supports both dexterous hand and humanoid robot. The significant variations in DoF, dimensions, and finger count demonstrate the cross-embodiment generalizability of our approach. We employ a simulated 12-DoF configuration liManipTransEfficientDexterous2025 of the Inspire and Ability hands, removing the joint constraints compared to their real-world versions.
  • ...and 5 more figures