DexterityGen: Foundation Controller for Unprecedented Dexterity

Zhao-Heng Yin; Changhao Wang; Luis Pineda; Francois Hogan; Krishna Bodduluri; Akash Sharma; Patrick Lancaster; Ishita Prasad; Mrinal Kalakrishnan; Jitendra Malik; Mike Lambeta; Tingfan Wu; Pieter Abbeel; Mustafa Mukadam

DexterityGen: Foundation Controller for Unprecedented Dexterity

Zhao-Heng Yin, Changhao Wang, Luis Pineda, Francois Hogan, Krishna Bodduluri, Akash Sharma, Patrick Lancaster, Ishita Prasad, Mrinal Kalakrishnan, Jitendra Malik, Mike Lambeta, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam

TL;DR

DexterityGen tackles the challenge of dexterous in-hand manipulation by training a large-scale, simulation-based dataset of low-level motion primitives via reinforcement learning and distilling them into a diffusion-based generator. This DexGen controller translates coarse, externally provided motion prompts into safe, fine-grained finger motions, with an inverse dynamics module converting those motions into executable actions; gradient guidance during diffusion preserves the input command while ensuring stability. In real-world tests, DexGen enables unprecedented dexterous behavior, including reorientation and tool use (e.g., pen, syringe, screwdriver), under teleoperation prompts and demonstrates robust shared autonomy with protective contact stabilization. The work suggests a practical, scalable path to a foundation controller for dexterous robotics, capable of coupling high-level semantic guidance with reliable low-level execution across diverse objects and tasks, while outlining important directions for future improvements such as touch sensing and vision integration.

Abstract

Teaching robots dexterous manipulation skills, such as tool use, presents a significant challenge. Current approaches can be broadly categorized into two strategies: human teleoperation (for imitation learning) and sim-to-real reinforcement learning. The first approach is difficult as it is hard for humans to produce safe and dexterous motions on a different embodiment without touch feedback. The second RL-based approach struggles with the domain gap and involves highly task-specific reward engineering on complex tasks. Our key insight is that RL is effective at learning low-level motion primitives, while humans excel at providing coarse motion commands for complex, long-horizon tasks. Therefore, the optimal solution might be a combination of both approaches. In this paper, we introduce DexterityGen (DexGen), which uses RL to pretrain large-scale dexterous motion primitives, such as in-hand rotation or translation. We then leverage this learned dataset to train a dexterous foundational controller. In the real world, we use human teleoperation as a prompt to the controller to produce highly dexterous behavior. We evaluate the effectiveness of DexGen in both simulation and real world, demonstrating that it is a general-purpose controller that can realize input dexterous manipulation commands and significantly improves stability by 10-100x measured as duration of holding objects across diverse tasks. Notably, with DexGen we demonstrate unprecedented dexterous skills including diverse object reorientation and dexterous tool use such as pen, syringe, and screwdriver for the first time.

DexterityGen: Foundation Controller for Unprecedented Dexterity

TL;DR

Abstract

DexterityGen: Foundation Controller for Unprecedented Dexterity

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)