Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng
TL;DR
This work tackles generating physically plausible human grasps from single-view scene point clouds, a scenario common in real-world perception but challenging due to object incompleteness and scene clutter. The authors introduce S2HGrasp, a two-module framework combining a Global Perception pathway for global object understanding with a DiffuGrasp diffusion-based grasp generator conditioned on scene features. They also release S2HGD, a large synthetic dataset of ~99,000 single-view point clouds for 1,668 objects to support learning and evaluation. Experimental results show end-to-end S2HGrasp outperforms two-stage methods and baseline diffusion models, achieving natural grasps with reduced penetration and good generalization to unseen objects. The work advances practical hand-object interaction modeling in cluttered, real-world viewpoints and provides resources for future research.
Abstract
In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the model is easily affected by scene points. Thus, we introduce S2HGrasp, a framework composed of two key modules: the Global Perception module that globally perceives partial object point clouds, and the DiffuGrasp module designed to generate high-quality human grasps based on complex inputs that include scene points. Additionally, we introduce S2HGD dataset, which comprises approximately 99,000 single-object single-view scene point clouds of 1,668 unique objects, each annotated with one human grasp. Our extensive experiments demonstrate that S2HGrasp can not only generate natural human grasps regardless of scene points, but also effectively prevent penetration between the hand and invisible parts of the object. Moreover, our model showcases strong generalization capability when applied to unseen objects. Our code and dataset are available at https://github.com/iSEE-Laboratory/S2HGrasp.
