Table of Contents
Fetching ...

BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization

Jiayi Chen, Yubin Ke, He Wang

TL;DR

This paper tackles the data bottleneck in dexterous grasping by introducing BODex, a scalable, GPU-accelerated grasp synthesis system formulated as a bilevel optimization with a lower-level QCQP $Q_j$ and an upper-level pose optimizer. It achieves high throughput (over $49$ grasps/s on a single $R$TX 3090) and strong simulation performance (SSR $>75\%$) across multiple hands, while delivering a large, high-quality dataset and a reproducible MuJoCo benchmark. The method replaces friction- and force-assumptions in prior energies with a differentiable, QP-based energy, uses a coarse-to-fine contact model, and employs batched GPU solvers for speed, enabling large-scale data generation and learning-based improvements. Real-world tests on a Shadow Hand reach $81\%$ success across $20$ objects, and networks trained on BODex substantially outperform those trained on prior datasets, underscoring the approach’s practical impact for dexterous manipulation and data-driven policy learning.

Abstract

Robotic dexterous grasping is important for interacting with the environment. To unleash the potential of data-driven models for dexterous grasping, a large-scale, high-quality dataset is essential. While gradient-based optimization offers a promising way for constructing such datasets, previous works suffer from limitations, such as inefficiency, strong assumptions in the grasp quality energy, or limited object sets for experiments. Moreover, the lack of a standard benchmark for comparing different methods and datasets hinders progress in this field. To address these challenges, we develop a highly efficient synthesis system and a comprehensive benchmark with MuJoCo for dexterous grasping. We formulate grasp synthesis as a bilevel optimization problem, combining a novel lower-level quadratic programming (QP) with an upper-level gradient descent process. By leveraging recent advances in CUDA-accelerated robotic libraries and GPU-based QP solvers, our system can parallelize thousands of grasps and synthesize over 49 grasps per second on a single 3090 GPU. Our synthesized grasps for Shadow, Allegro, and Leap hands all achieve a success rate above 75% in simulation, with a penetration depth under 1 mm, outperforming existing baselines on nearly all metrics. Compared to the previous large-scale dataset, DexGraspNet, our dataset significantly improves the performance of learning models, with a success rate from around 40% to 80% in simulation. Real-world testing of the trained model on the Shadow Hand achieves an 81% success rate across 20 diverse objects. The codes and datasets are released on our project page: https://pku-epic.github.io/BODex.

BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization

TL;DR

This paper tackles the data bottleneck in dexterous grasping by introducing BODex, a scalable, GPU-accelerated grasp synthesis system formulated as a bilevel optimization with a lower-level QCQP and an upper-level pose optimizer. It achieves high throughput (over grasps/s on a single TX 3090) and strong simulation performance (SSR ) across multiple hands, while delivering a large, high-quality dataset and a reproducible MuJoCo benchmark. The method replaces friction- and force-assumptions in prior energies with a differentiable, QP-based energy, uses a coarse-to-fine contact model, and employs batched GPU solvers for speed, enabling large-scale data generation and learning-based improvements. Real-world tests on a Shadow Hand reach success across objects, and networks trained on BODex substantially outperform those trained on prior datasets, underscoring the approach’s practical impact for dexterous manipulation and data-driven policy learning.

Abstract

Robotic dexterous grasping is important for interacting with the environment. To unleash the potential of data-driven models for dexterous grasping, a large-scale, high-quality dataset is essential. While gradient-based optimization offers a promising way for constructing such datasets, previous works suffer from limitations, such as inefficiency, strong assumptions in the grasp quality energy, or limited object sets for experiments. Moreover, the lack of a standard benchmark for comparing different methods and datasets hinders progress in this field. To address these challenges, we develop a highly efficient synthesis system and a comprehensive benchmark with MuJoCo for dexterous grasping. We formulate grasp synthesis as a bilevel optimization problem, combining a novel lower-level quadratic programming (QP) with an upper-level gradient descent process. By leveraging recent advances in CUDA-accelerated robotic libraries and GPU-based QP solvers, our system can parallelize thousands of grasps and synthesize over 49 grasps per second on a single 3090 GPU. Our synthesized grasps for Shadow, Allegro, and Leap hands all achieve a success rate above 75% in simulation, with a penetration depth under 1 mm, outperforming existing baselines on nearly all metrics. Compared to the previous large-scale dataset, DexGraspNet, our dataset significantly improves the performance of learning models, with a success rate from around 40% to 80% in simulation. Real-world testing of the trained model on the Shadow Hand achieves an 81% success rate across 20 diverse objects. The codes and datasets are released on our project page: https://pku-epic.github.io/BODex.

Paper Structure

This paper contains 22 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Comparison with analytic-based dexterous grasp synthesis baselines on Allegro Hand. Our pipeline significantly outperforms baselines on almost all metrics, especially on the most important two, simulation success rate and speed.
  • Figure 2: Coarse-to-fine Strategy.
  • Figure 3: Visualization of Randomly Selected Grasps. Previous analytic-based synthesis methods show more penetration (green circles), with fingers often not contact the object (orange circles) and some unnatural poses (black boxes).
  • Figure 4: More visualization of our dataset.
  • Figure 5: Comparison of different grasp energy.
  • ...and 3 more figures