GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Hui Zhang, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song
TL;DR
GraspXL presents a scalable reinforcement-learning framework for generating grasping motions that satisfy multiple objectives across diverse objects and dexterous hands without relying on hand–object interaction data. It combines objective-driven hand guidance, a curriculum learning strategy, and distance-based object features to generalize to over 500k unseen objects, including generated or reconstructed meshes. The approach achieves high grasp success and close adherence to targets on PartNet, ShapeNet, and Objaverse, while transferring across MANO, Shadow, Allegro, and Faive hands. The authors release code, pretrained policies, and a large-scale dataset of generated grasp motions to enable downstream research and applications.
Abstract
Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model, code, and the large-scale generated motions are available at https://eth-ait.github.io/graspxl/.
