Table of Contents
Fetching ...

GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping

Ruixiang Wang, Huayi Zhou, Xinyue Yao, Guiliang Liu, Kui Jia

TL;DR

GAT-Grasp presents a gesture-conditioned, retrieval-based framework for task-aware robotic grasping that maps human pointing and grasp gestures to region localization and orientation-constrained grasps without object priors. It builds an Affordance Memory Bank from large HOI video data and uses hierarchical retrieval to transfer hand-grasp affordances to target objects, followed by a hand-to-gripper rotation mapping and optional integration with existing grasp generators. The approach achieves robust, zero-shot generalization and outperforms language- and vision-based baselines in cluttered real-world scenes, with ablations confirming the critical roles of pointing, visuospatial transfer, and rotation constraints. This work advances intuitive human-robot collaboration by enabling precise, task-specific grasps through non-verbal gestures and scalable affordance transfer.

Abstract

Achieving precise and generalizable grasping across diverse objects and environments is essential for intelligent and collaborative robotic systems. However, existing approaches often struggle with ambiguous affordance reasoning and limited adaptability to unseen objects, leading to suboptimal grasp execution. In this work, we propose GAT-Grasp, a gesture-driven grasping framework that directly utilizes human hand gestures to guide the generation of task-specific grasp poses with appropriate positioning and orientation. Specifically, we introduce a retrieval-based affordance transfer paradigm, leveraging the implicit correlation between hand gestures and object affordances to extract grasping knowledge from large-scale human-object interaction videos. By eliminating the reliance on pre-given object priors, GAT-Grasp enables zero-shot generalization to novel objects and cluttered environments. Real-world evaluations confirm its robustness across diverse and unseen scenarios, demonstrating reliable grasp execution in complex task settings.

GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping

TL;DR

GAT-Grasp presents a gesture-conditioned, retrieval-based framework for task-aware robotic grasping that maps human pointing and grasp gestures to region localization and orientation-constrained grasps without object priors. It builds an Affordance Memory Bank from large HOI video data and uses hierarchical retrieval to transfer hand-grasp affordances to target objects, followed by a hand-to-gripper rotation mapping and optional integration with existing grasp generators. The approach achieves robust, zero-shot generalization and outperforms language- and vision-based baselines in cluttered real-world scenes, with ablations confirming the critical roles of pointing, visuospatial transfer, and rotation constraints. This work advances intuitive human-robot collaboration by enabling precise, task-specific grasps through non-verbal gestures and scalable affordance transfer.

Abstract

Achieving precise and generalizable grasping across diverse objects and environments is essential for intelligent and collaborative robotic systems. However, existing approaches often struggle with ambiguous affordance reasoning and limited adaptability to unseen objects, leading to suboptimal grasp execution. In this work, we propose GAT-Grasp, a gesture-driven grasping framework that directly utilizes human hand gestures to guide the generation of task-specific grasp poses with appropriate positioning and orientation. Specifically, we introduce a retrieval-based affordance transfer paradigm, leveraging the implicit correlation between hand gestures and object affordances to extract grasping knowledge from large-scale human-object interaction videos. By eliminating the reliance on pre-given object priors, GAT-Grasp enables zero-shot generalization to novel objects and cluttered environments. Real-world evaluations confirm its robustness across diverse and unseen scenarios, demonstrating reliable grasp execution in complex task settings.

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of our retrieval and transfer pipeline. Given a grasp gesture, our method accurately locates the robot grasp point ${\star}$ on the target object.
  • Figure 2: Schematic diagram of hand gesture mapping to the gripper. (Left) Alignment of the gesture coordinate system with the gripper coordinate system. (Right) Demonstration of gripper grasping using the computed rotation angle.
  • Figure 3: Visualization of different grasping tasks performed on real robots. The pointing gesture is represented in pink color, while the grasp gesture is depicted in blue color. The robot then executes the corresponding actions based on these gestures.
  • Figure 4: Quantitative results of single-object grasping experiments. The x-axis represents the object part index: for seen objects, these include the bucket handle, bucket edge, coke ring, coke can body, and stapler; for unseen objects, these include the screwdriver handle, screwdriver tip, teapot handle, teapot lid, and plush toy. Each experiment was repeated 10 times to measure the success rate (SR).