BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
Jiayi Chen, Yubin Ke, He Wang
TL;DR
This paper tackles the data bottleneck in dexterous grasping by introducing BODex, a scalable, GPU-accelerated grasp synthesis system formulated as a bilevel optimization with a lower-level QCQP $Q_j$ and an upper-level pose optimizer. It achieves high throughput (over $49$ grasps/s on a single $R$TX 3090) and strong simulation performance (SSR $>75\%$) across multiple hands, while delivering a large, high-quality dataset and a reproducible MuJoCo benchmark. The method replaces friction- and force-assumptions in prior energies with a differentiable, QP-based energy, uses a coarse-to-fine contact model, and employs batched GPU solvers for speed, enabling large-scale data generation and learning-based improvements. Real-world tests on a Shadow Hand reach $81\%$ success across $20$ objects, and networks trained on BODex substantially outperform those trained on prior datasets, underscoring the approach’s practical impact for dexterous manipulation and data-driven policy learning.
Abstract
Robotic dexterous grasping is important for interacting with the environment. To unleash the potential of data-driven models for dexterous grasping, a large-scale, high-quality dataset is essential. While gradient-based optimization offers a promising way for constructing such datasets, previous works suffer from limitations, such as inefficiency, strong assumptions in the grasp quality energy, or limited object sets for experiments. Moreover, the lack of a standard benchmark for comparing different methods and datasets hinders progress in this field. To address these challenges, we develop a highly efficient synthesis system and a comprehensive benchmark with MuJoCo for dexterous grasping. We formulate grasp synthesis as a bilevel optimization problem, combining a novel lower-level quadratic programming (QP) with an upper-level gradient descent process. By leveraging recent advances in CUDA-accelerated robotic libraries and GPU-based QP solvers, our system can parallelize thousands of grasps and synthesize over 49 grasps per second on a single 3090 GPU. Our synthesized grasps for Shadow, Allegro, and Leap hands all achieve a success rate above 75% in simulation, with a penetration depth under 1 mm, outperforming existing baselines on nearly all metrics. Compared to the previous large-scale dataset, DexGraspNet, our dataset significantly improves the performance of learning models, with a success rate from around 40% to 80% in simulation. Real-world testing of the trained model on the Shadow Hand achieves an 81% success rate across 20 diverse objects. The codes and datasets are released on our project page: https://pku-epic.github.io/BODex.
