Table of Contents
Fetching ...

GenDexHand: Generative Simulation for Dexterous Hands

Feng Chen, Zhuxiu Xu, Tianzhe Chu, Xunzhe Zhou, Li Sun, Zewen Wu, Shenghua Gao, Zhongyu Li, Yanchao Yang, Yi Ma

TL;DR

GenDexHand tackles data scarcity in dexterous hand manipulation by presenting a fully automated generative simulation pipeline that creates diverse tasks and environments in simulation. It combines task proposal by foundation models, multimodal model refinement, and hierarchical trajectory generation to produce high-quality dexterous hand data, achieving a substantial average improvement in task success via a motion-planning–RL hybrid and subtask decomposition. The approach yields richer task diversity and scalable data generation, addressing a key bottleneck in dexterous embodied intelligence. This work empowers scalable, simulation-based training for dexterous hands and lays groundwork for broader adoption of generative models in complex robotic manipulation.

Abstract

Data scarcity remains a fundamental bottleneck for embodied intelligence. Existing approaches use large language models (LLMs) to automate gripper-based simulation generation, but they transfer poorly to dexterous manipulation, which demands more specialized environment design. Meanwhile, dexterous manipulation tasks are inherently more difficult due to their higher degrees of freedom. Massively generating feasible and trainable dexterous hand tasks remains an open challenge. To this end, we present GenDexHand, a generative simulation pipeline that autonomously produces diverse robotic tasks and environments for dexterous manipulation. GenDexHand introduces a closed-loop refinement process that adjusts object placements and scales based on vision-language model (VLM) feedback, substantially improving the average quality of generated environments. Each task is further decomposed into sub-tasks to enable sequential reinforcement learning, reducing training time and increasing success rates. Our work provides a viable path toward scalable training of diverse dexterous hand behaviors in embodied intelligence by offering a simulation-based solution to synthetic data generation. Our website: https://winniechen2002.github.io/GenDexHand/.

GenDexHand: Generative Simulation for Dexterous Hands

TL;DR

GenDexHand tackles data scarcity in dexterous hand manipulation by presenting a fully automated generative simulation pipeline that creates diverse tasks and environments in simulation. It combines task proposal by foundation models, multimodal model refinement, and hierarchical trajectory generation to produce high-quality dexterous hand data, achieving a substantial average improvement in task success via a motion-planning–RL hybrid and subtask decomposition. The approach yields richer task diversity and scalable data generation, addressing a key bottleneck in dexterous embodied intelligence. This work empowers scalable, simulation-based training for dexterous hands and lays groundwork for broader adoption of generative models in complex robotic manipulation.

Abstract

Data scarcity remains a fundamental bottleneck for embodied intelligence. Existing approaches use large language models (LLMs) to automate gripper-based simulation generation, but they transfer poorly to dexterous manipulation, which demands more specialized environment design. Meanwhile, dexterous manipulation tasks are inherently more difficult due to their higher degrees of freedom. Massively generating feasible and trainable dexterous hand tasks remains an open challenge. To this end, we present GenDexHand, a generative simulation pipeline that autonomously produces diverse robotic tasks and environments for dexterous manipulation. GenDexHand introduces a closed-loop refinement process that adjusts object placements and scales based on vision-language model (VLM) feedback, substantially improving the average quality of generated environments. Each task is further decomposed into sub-tasks to enable sequential reinforcement learning, reducing training time and increasing success rates. Our work provides a viable path toward scalable training of diverse dexterous hand behaviors in embodied intelligence by offering a simulation-based solution to synthetic data generation. Our website: https://winniechen2002.github.io/GenDexHand/.

Paper Structure

This paper contains 21 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A showcase of 15 diverse and realistic task scenes automatically generated by GenDexHand.
  • Figure 2: Overview of the GenDexHand pipeline for task generation. The process consists of four stages: Environment Proposal, Environment Creation, MLLM Refinement, and Trajectory Generation. Embodied assets and object assets are first provided to the Generator to produce an environment proposal. The simulator then renders multi-view images of the proposed scene, which are refined using an MLLM. Finally, the refined environment and proposal are combined to generate the resulting dexterous hand trajectory.
  • Figure 3: Two examples of task refinement using MLLM. Modification directives include Scale_Action, formatted as object - scale value, Position_Action, formatted as object - move_[x/y/z] value, and Pose_Action, formatted as object - rotate_[x/y/z] value.
  • Figure 4: Bar chart comparing three tasks: “Open Cabinet,” “Pick up Bottle,” and “Put the Apple into Bowl.” The Y-axis denotes the success rate $\uparrow$ and the number of environment steps $\downarrow$ required to collect 1000 successful trajectories in evaluation. Four methods are evaluated: (i) w/o subgoal, baseline RL without subtask decomposition; (ii) w/ subgoals, RL with tasks decomposed into short-horizon subgoals; (iii) w/ freeze-DOFs, RL with selective freezing of redundant degrees of freedom; and (iv) w/ motion planning (Ours), approaching subtasks using motion planning instead.