UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Sizhe Yang; Yiman Xie; Zhixuan Liang; Yang Tian; Jia Zeng; Dahua Lin; Jiangmiao Pang

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Sizhe Yang, Yiman Xie, Zhixuan Liang, Yang Tian, Jia Zeng, Dahua Lin, Jiangmiao Pang

TL;DR

The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies, and develops a simple yet effective grasp policy that achieves robust zero-shot sim-to-real transfer.

Abstract

Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

TL;DR

Abstract

Paper Structure (26 sections, 9 equations, 6 figures, 3 tables)

This paper contains 26 sections, 9 equations, 6 figures, 3 tables.

Introduction
Related Work
Preliminaries
Definition of Grasp Pose for Bimanual Robots
Basics of Grasp Modeling
Universal Dexterous Grasp Dataset
Grasp Synthesis
Demonstration Generation
Universal Dexterous Grasp Policy
Overall Architecture
Point Cloud Encoding
Action Prediction
Experiments
Simulation Experiments
Experimental Setup
...and 11 more sections

Figures (6)

Figure 1: Overview.UltraDexGrasp is a framework for universal dexterous grasping with bimanual robots. The proposed data generation pipeline integrates an optimization-based grasp synthesizer with a planning-based demonstration generation module, and supports multiple grasp strategies, including two-finger pinch, three-finger tripod, whole-hand grasp, and bimanual grasp. Trained on data produced by this pipeline, the policy demonstrates robust zero-shot sim-to-real transfer and strong generalization to novel objects with varied shapes, sizes, and weights.
Figure 2: Overview of data generation pipeline. We first collect diverse object assets and import the objects and the robot URDF files into the simulator. An optimization-based grasp synthesizer is then used to generate feasible grasps, from which the preferred grasp is selected. Finally, motion planning is employed to generate demonstration trajectories.
Figure 3: Hand contact points for various grasp strategies. Different grasp strategies select distinct fingertip contact points, which are used to compute energy terms in the optimization process of grasp synthesis.
Figure 4: Overview of the policy architecture. The proposed grasp policy takes point clouds as input, encodes them using a point encoder, aggregates scene features via unidirectional attention, and predicts control commands. The policy supports multiple grasp strategies and improves generalization across diverse objects.
Figure 5: The variation in performance with growing amounts of training data. The performance of the policy consistently improves as the volume of data increases.
...and 1 more figures

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

TL;DR

Abstract

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Authors

TL;DR

Abstract

Table of Contents

Figures (6)