Table of Contents
Fetching ...

ScaleADFG: Affordance-based Dexterous Functional Grasping via Scalable Dataset

Sizhe Wang, Yifan Yang, Yongkang Luo, Daheng Li, Wei Wei, Yan Zhang, Peiying Hu, Yunjin Fu, Haonan Duan, Jia Sun, Peng Wang

TL;DR

ScaleADFG tackles dexterous functional grasping under scale variance by building ScaleADFG-Dataset through an affordance-based synthesis pipeline that leverages pretrained models for image segmentation, 3D asset generation, and affordance perception. It introduces ScaleADFG-Net, a lightweight CVAE-based grasp predictor trained on the synthetic dataset, capable of zero-shot transfer to real objects. The dataset spans 5 object categories and 2 hands with over 1,000 shapes per category and 15 scales, yielding robust diversity with more than 60,000 high-quality grasps per hand after filtering. Experimental results in simulation and on a real robot show strong adaptability to object scale, improved functional grasp stability and diversity, and effective real-world transfer, validating the practicality of large-scale, multi-scale data for dexterous functional grasping.

Abstract

Dexterous functional tool-use grasping is essential for effective robotic manipulation of tools. However, existing approaches face significant challenges in efficiently constructing large-scale datasets and ensuring generalizability to everyday object scales. These issues primarily arise from size mismatches between robotic and human hands, and the diversity in real-world object scales. To address these limitations, we propose the ScaleADFG framework, which consists of a fully automated dataset construction pipeline and a lightweight grasp generation network. Our dataset introduce an affordance-based algorithm to synthesize diverse tool-use grasp configurations without expert demonstrations, allowing flexible object-hand size ratios and enabling large robotic hands (compared to human hands) to grasp everyday objects effectively. Additionally, we leverage pre-trained models to generate extensive 3D assets and facilitate efficient retrieval of object affordances. Our dataset comprising five object categories, each containing over 1,000 unique shapes with 15 scale variations. After filtering, the dataset includes over 60,000 grasps for each 2 dexterous robotic hands. On top of this dataset, we train a lightweight, single-stage grasp generation network with a notably simple loss design, eliminating the need for post-refinement. This demonstrates the critical importance of large-scale datasets and multi-scale object variant for effective training. Extensive experiments in simulation and on real robot confirm that the ScaleADFG framework exhibits strong adaptability to objects of varying scales, enhancing functional grasp stability, diversity, and generalizability. Moreover, our network exhibits effective zero-shot transfer to real-world objects. Project page is available at https://sizhe-wang.github.io/ScaleADFG_webpage

ScaleADFG: Affordance-based Dexterous Functional Grasping via Scalable Dataset

TL;DR

ScaleADFG tackles dexterous functional grasping under scale variance by building ScaleADFG-Dataset through an affordance-based synthesis pipeline that leverages pretrained models for image segmentation, 3D asset generation, and affordance perception. It introduces ScaleADFG-Net, a lightweight CVAE-based grasp predictor trained on the synthetic dataset, capable of zero-shot transfer to real objects. The dataset spans 5 object categories and 2 hands with over 1,000 shapes per category and 15 scales, yielding robust diversity with more than 60,000 high-quality grasps per hand after filtering. Experimental results in simulation and on a real robot show strong adaptability to object scale, improved functional grasp stability and diversity, and effective real-world transfer, validating the practicality of large-scale, multi-scale data for dexterous functional grasping.

Abstract

Dexterous functional tool-use grasping is essential for effective robotic manipulation of tools. However, existing approaches face significant challenges in efficiently constructing large-scale datasets and ensuring generalizability to everyday object scales. These issues primarily arise from size mismatches between robotic and human hands, and the diversity in real-world object scales. To address these limitations, we propose the ScaleADFG framework, which consists of a fully automated dataset construction pipeline and a lightweight grasp generation network. Our dataset introduce an affordance-based algorithm to synthesize diverse tool-use grasp configurations without expert demonstrations, allowing flexible object-hand size ratios and enabling large robotic hands (compared to human hands) to grasp everyday objects effectively. Additionally, we leverage pre-trained models to generate extensive 3D assets and facilitate efficient retrieval of object affordances. Our dataset comprising five object categories, each containing over 1,000 unique shapes with 15 scale variations. After filtering, the dataset includes over 60,000 grasps for each 2 dexterous robotic hands. On top of this dataset, we train a lightweight, single-stage grasp generation network with a notably simple loss design, eliminating the need for post-refinement. This demonstrates the critical importance of large-scale datasets and multi-scale object variant for effective training. Extensive experiments in simulation and on real robot confirm that the ScaleADFG framework exhibits strong adaptability to objects of varying scales, enhancing functional grasp stability, diversity, and generalizability. Moreover, our network exhibits effective zero-shot transfer to real-world objects. Project page is available at https://sizhe-wang.github.io/ScaleADFG_webpage

Paper Structure

This paper contains 41 sections, 14 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Adaptability on scale variance and shape variance for both hands.
  • Figure 2: Overview of the ScaleADFG pipeline for dexterous functional grasping. The large-scale ScaleADFG-Dataset comprises automated 3D object generation from internet images, affordance retrieval using dense correspondence, and optimization-based synthesis of functional grasps with axis alignment initialization. A light weight conditional variational autoencoder (CVAE) based ScaleADFG-Net predicts grasps for diverse object shapes and scales, trained with the constructed dataset. In real world, we use the ScaleADFG-Net to inference grasp configurations. Hand and object affordances are illustrated, with matching colors indicating the alignment of functional parts and grasping parts.
  • Figure 3: Illustration of the proposed initialization method. The object (a) and hand (b) are each associated with two axes, with corresponding axes indicated by the same color to show the alignment between grasping and functional parts. (c) presents examples of the initialization results.
  • Figure 4: Evaluation of Scale Generalization Performance. All data were tested on Unseen Scale both on seen objects and unseen objects. Unseen scales were interpolated from existing scales in the dataset. The interpolation between the 1st and 2nd scales is referred to as the 1.5th scale. Testing was conducted on five unseen scales: 1.5, 4.5, 7.5, 10.5, and 14.5, with both extrapolated scales and interpolated scales. Besides, since all metrics are better when smaller, a smaller area is preferable.
  • Figure 5: Comparison with DexFG dexfg on functional grasping across different object scales, demonstrating the enhanced scale-adaptability of ScaleADFG. For a fair comparison, we define the entire object body as the grasping part for both methods during evaluation.
  • ...and 2 more figures