ScaleADFG: Affordance-based Dexterous Functional Grasping via Scalable Dataset
Sizhe Wang, Yifan Yang, Yongkang Luo, Daheng Li, Wei Wei, Yan Zhang, Peiying Hu, Yunjin Fu, Haonan Duan, Jia Sun, Peng Wang
TL;DR
ScaleADFG tackles dexterous functional grasping under scale variance by building ScaleADFG-Dataset through an affordance-based synthesis pipeline that leverages pretrained models for image segmentation, 3D asset generation, and affordance perception. It introduces ScaleADFG-Net, a lightweight CVAE-based grasp predictor trained on the synthetic dataset, capable of zero-shot transfer to real objects. The dataset spans 5 object categories and 2 hands with over 1,000 shapes per category and 15 scales, yielding robust diversity with more than 60,000 high-quality grasps per hand after filtering. Experimental results in simulation and on a real robot show strong adaptability to object scale, improved functional grasp stability and diversity, and effective real-world transfer, validating the practicality of large-scale, multi-scale data for dexterous functional grasping.
Abstract
Dexterous functional tool-use grasping is essential for effective robotic manipulation of tools. However, existing approaches face significant challenges in efficiently constructing large-scale datasets and ensuring generalizability to everyday object scales. These issues primarily arise from size mismatches between robotic and human hands, and the diversity in real-world object scales. To address these limitations, we propose the ScaleADFG framework, which consists of a fully automated dataset construction pipeline and a lightweight grasp generation network. Our dataset introduce an affordance-based algorithm to synthesize diverse tool-use grasp configurations without expert demonstrations, allowing flexible object-hand size ratios and enabling large robotic hands (compared to human hands) to grasp everyday objects effectively. Additionally, we leverage pre-trained models to generate extensive 3D assets and facilitate efficient retrieval of object affordances. Our dataset comprising five object categories, each containing over 1,000 unique shapes with 15 scale variations. After filtering, the dataset includes over 60,000 grasps for each 2 dexterous robotic hands. On top of this dataset, we train a lightweight, single-stage grasp generation network with a notably simple loss design, eliminating the need for post-refinement. This demonstrates the critical importance of large-scale datasets and multi-scale object variant for effective training. Extensive experiments in simulation and on real robot confirm that the ScaleADFG framework exhibits strong adaptability to objects of varying scales, enhancing functional grasp stability, diversity, and generalizability. Moreover, our network exhibits effective zero-shot transfer to real-world objects. Project page is available at https://sizhe-wang.github.io/ScaleADFG_webpage
