RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation
Yilin Wang, Chuan Guo, Li Cheng, Hai Jiang
TL;DR
RegionGrasp tackles the problem of region-controllable hand grasp generation by proposing RegionGrasp-CVAE, a conditional variational autoencoder equipped with ConditionNet for region-aware object encoding and HOINet for interaction-aware hand-object coupling. The approach uses point-patch representations and a pretraining strategy to capture geometry, enabling low-level spatial control over the contact region and robust hand-object interactions. Across ObMan and GRAB datasets, RegionGrasp-CVAE achieves competitive region controllability (CR) and contact quality (CCA/IV), while delivering diverse grasps and good generalization to out-of-domain objects; user studies corroborate improved controllability without sacrificing naturalness. This work advances practical region-specific grasp synthesis for applications like VR, and suggests future integration with physics priors and language-informed priors to further enhance plausibility and control.
Abstract
Can machine automatically generate multiple distinct and natural hand grasps, given specific contact region of an object in 3D? This motivates us to consider a novel task of \textit{Region Controllable Hand Grasp Generation (RegionGrasp)}, as follows: given as input a 3D object, together with its specific surface area selected as the intended contact region, to generate a diverse set of plausible hand grasps of the object, where the thumb finger tip touches the object surface on the contact region. To address this task, RegionGrasp-CVAE is proposed, which consists of two main parts. First, to enable contact region-awareness, we propose ConditionNet as the condition encoder that includes in it a transformer-backboned object encoder, O-Enc; a pretraining strategy is adopted by O-Enc, where the point patches of object surface are randomly masked off and subsequently restored, to further capture surface geometric information of the object. Second, to realize interaction awareness, HOINet is introduced to encode hand-object interaction features by entangling high-level hand features with embedded object features through geometric-aware multi-head cross attention. Empirical evaluations demonstrate the effectiveness of our approach qualitatively and quantitatively where it is shown to compare favorably with respect to the state of the art methods.
