Table of Contents
Fetching ...

Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation

Jianglong Ye, Keyi Wang, Chengjing Yuan, Ruihan Yang, Yiquan Li, Jiyue Zhu, Yuzhe Qin, Xueyan Zou, Xiaolong Wang

TL;DR

Dex1B presents a scalable data-generation framework that blends optimization-based seed data with a conditional CVAE to produce 1B demonstrations for dexterous grasping and articulation. The DexSimple baseline, empowered by geometric constraints and post-optimization, achieves state-of-the-art performance and demonstrates strong sim-to-real transfer. The work provides a comprehensive benchmark, extensive analyses of data diversity and scaling, and a practical pipeline for high-volume, diverse dexterous demonstrations with real-world applicability. Overall, it advances data-centric approaches in dexterous manipulation and offers practical pathways for training robust, transferable policies.

Abstract

Generating large-scale demonstrations for dexterous hand manipulation remains challenging, and several approaches have been proposed in recent years to address this. Among them, generative models have emerged as a promising paradigm, enabling the efficient creation of diverse and physically plausible demonstrations. In this paper, we introduce Dex1B, a large-scale, diverse, and high-quality demonstration dataset produced with generative models. The dataset contains one billion demonstrations for two fundamental tasks: grasping and articulation. To construct it, we propose a generative model that integrates geometric constraints to improve feasibility and applies additional conditions to enhance diversity. We validate the model on both established and newly introduced simulation benchmarks, where it significantly outperforms prior state-of-the-art methods. Furthermore, we demonstrate its effectiveness and robustness through real-world robot experiments. Our project page is at https://jianglongye.com/dex1b

Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation

TL;DR

Dex1B presents a scalable data-generation framework that blends optimization-based seed data with a conditional CVAE to produce 1B demonstrations for dexterous grasping and articulation. The DexSimple baseline, empowered by geometric constraints and post-optimization, achieves state-of-the-art performance and demonstrates strong sim-to-real transfer. The work provides a comprehensive benchmark, extensive analyses of data diversity and scaling, and a practical pipeline for high-volume, diverse dexterous demonstrations with real-world applicability. Overall, it advances data-centric approaches in dexterous manipulation and offers practical pathways for training robust, transferable policies.

Abstract

Generating large-scale demonstrations for dexterous hand manipulation remains challenging, and several approaches have been proposed in recent years to address this. Among them, generative models have emerged as a promising paradigm, enabling the efficient creation of diverse and physically plausible demonstrations. In this paper, we introduce Dex1B, a large-scale, diverse, and high-quality demonstration dataset produced with generative models. The dataset contains one billion demonstrations for two fundamental tasks: grasping and articulation. To construct it, we propose a generative model that integrates geometric constraints to improve feasibility and applies additional conditions to enhance diversity. We validate the model on both established and newly introduced simulation benchmarks, where it significantly outperforms prior state-of-the-art methods. Furthermore, we demonstrate its effectiveness and robustness through real-world robot experiments. Our project page is at https://jianglongye.com/dex1b

Paper Structure

This paper contains 19 sections, 16 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: The Dex1B benchmark consists of 1B generated high-quality demonstrations for grasping (top) and articulation (middle) tasks. At the bottom, we show the direct sim-to-real transfer results of our method DexSimple trained on Dex1B. This demonstrates that Dex1B is scalable and generalizable to real environments.
  • Figure 2: Dex1B demonstration collection. The engine takes object assets and hand pose initialization as input, using a control-based optimization algorithm to generate the Seed dataset. Then the Seed dataset is used as the training data for DexSimple, else for Dex1B for the last iteration. Then DexSimple will generate a scaled proposal dataset with $\pi$ as the scaling ratio. For the proposal dataset, we then use the simulation critic and debiased algorithm to create the debiased dataset for optimization refinement.
  • Figure 3: DexSimple Pipeline. Our model takes in hand parameters and object point clouds as fixed input for CVAE, while root rotation, translation, and joint value as optional conditions. Those are combined as the input embeddings for CVAE, while point cloud embeddings are re-emphasized at the latent space. The output of CVAE is the forward kinematics define the hand pose trajectory optimized by the effective loss function.
  • Figure 4: Diverse demonstrations for objects from train/test splits. We show only the contact frame for clarity.
  • Figure 5: Lifting trajectory from Dex1B dataset.
  • ...and 7 more figures