Table of Contents
Fetching ...

3D Whole-body Grasp Synthesis with Directional Controllability

Georgios Paschalidis, Romana Wilschut, Dimitrije Antić, Omid Taheri, Dimitrios Tzionas

TL;DR

This work addresses the challenge of realistically synthesizing 3D whole-body grasps with objects on receptacles, where data scarcity and receptacle-aware coordination hinder prior methods. It introduces ReachingField to provide controllable reach directions, and two controllable generators, CReach for the reaching body and CGrasp for the grasping hand, both conditioned on a shared direction. The CWGrasp framework then combines these components with a geometry-aware optimization to produce dexterous SMPL-X grasps, achieving up to $16\times$ faster performance than FLEX and enabling both right- and left-hand grasps. Evaluations on GRAB and ReplicaGrasp show that CWGrasp yields realistic, controllable, and efficient whole-body grasps, with a perceptual preference for realism in user studies. These results suggest substantial practical impact for animation, robotics, and synthetic data generation, by enabling controllable, geometry-consistent whole-body grasp synthesis.

Abstract

Synthesizing 3D whole bodies that realistically grasp objects is useful for animation, mixed reality, and robotics. This is challenging, because the hands and body need to look natural w.r.t. each other, the grasped object, as well as the local scene (i.e., a receptacle supporting the object). Moreover, training data for this task is really scarce, while capturing new data is expensive. Recent work goes beyond finite datasets via a divide-and-conquer approach; it first generates a "guiding" right-hand grasp, and then searches for bodies that match this. However, the guiding-hand synthesis lacks controllability and receptacle awareness, so it likely has an implausible direction (i.e., a body can't match this without penetrating the receptacle) and needs corrections through major post-processing. Moreover, the body search needs exhaustive sampling and is expensive. These are strong limitations. We tackle these with a novel method called CWGrasp. Our key idea is that performing geometry-based reasoning "early on," instead of "too late," provides rich "control" signals for inference. To this end, CWGrasp first samples a plausible reaching-direction vector (used later for both the arm and hand) from a probabilistic model built via ray-casting from the object and collision checking. Moreover, CWGrasp uniquely tackles both right and left-hand grasps. We evaluate on the GRAB and ReplicaGrasp datasets. CWGrasp outperforms baselines, at lower runtime and budget, while all components help performance. Code and models are available at https://gpaschalidis.github.io/cwgrasp.

3D Whole-body Grasp Synthesis with Directional Controllability

TL;DR

This work addresses the challenge of realistically synthesizing 3D whole-body grasps with objects on receptacles, where data scarcity and receptacle-aware coordination hinder prior methods. It introduces ReachingField to provide controllable reach directions, and two controllable generators, CReach for the reaching body and CGrasp for the grasping hand, both conditioned on a shared direction. The CWGrasp framework then combines these components with a geometry-aware optimization to produce dexterous SMPL-X grasps, achieving up to faster performance than FLEX and enabling both right- and left-hand grasps. Evaluations on GRAB and ReplicaGrasp show that CWGrasp yields realistic, controllable, and efficient whole-body grasps, with a perceptual preference for realism in user studies. These results suggest substantial practical impact for animation, robotics, and synthetic data generation, by enabling controllable, geometry-consistent whole-body grasp synthesis.

Abstract

Synthesizing 3D whole bodies that realistically grasp objects is useful for animation, mixed reality, and robotics. This is challenging, because the hands and body need to look natural w.r.t. each other, the grasped object, as well as the local scene (i.e., a receptacle supporting the object). Moreover, training data for this task is really scarce, while capturing new data is expensive. Recent work goes beyond finite datasets via a divide-and-conquer approach; it first generates a "guiding" right-hand grasp, and then searches for bodies that match this. However, the guiding-hand synthesis lacks controllability and receptacle awareness, so it likely has an implausible direction (i.e., a body can't match this without penetrating the receptacle) and needs corrections through major post-processing. Moreover, the body search needs exhaustive sampling and is expensive. These are strong limitations. We tackle these with a novel method called CWGrasp. Our key idea is that performing geometry-based reasoning "early on," instead of "too late," provides rich "control" signals for inference. To this end, CWGrasp first samples a plausible reaching-direction vector (used later for both the arm and hand) from a probabilistic model built via ray-casting from the object and collision checking. Moreover, CWGrasp uniquely tackles both right and left-hand grasps. We evaluate on the GRAB and ReplicaGrasp datasets. CWGrasp outperforms baselines, at lower runtime and budget, while all components help performance. Code and models are available at https://gpaschalidis.github.io/cwgrasp.
Paper Structure (22 sections, 13 equations, 24 figures, 3 tables)

This paper contains 22 sections, 13 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: We develop CWGrasp, a novel framework for synthesizing 3D whole-body grasps for an object placed on a receptacle. Our framework builds on a novel combination of geometric-based reasoning and controllable data-driven synthesis methods. By adding a novel controllability in the synthesis process, we achieve realistic results at a fraction of the computational cost w.r.t. the state of the art flex.
  • Figure 2: Controllable reaching-body synthesis (CReach). We show examples where multiple bodies (shown with severalcolors) are generated to reach a target wrist location (shown as a green sphere), while having a desired 3D arm direction (gray arrow).
  • Figure 3: Controllable hand-grasp synthesis. The goal is to grasp the red wineglass. Left -- GrabNetgrab: Due to GrabNet's lack of controllability$^1$, sampling its latent space produces plausible grasps (shown with severalcolors) but with random direction. Right -- Our CGrasp: We add controllability, so drawing samples produces plausible and varied grasps (shown with severalcolors), that have a desired 3D palm direction (shown with a gray arrow).
  • Figure 4: CWGrasp framework. We first sample a single reaching direction from ReachingField. Next, we condition both CGrasp and CReach on the same direction and obtain a guiding hand grasp (shown in blue) and a reaching body (shown in gray), respectively, that satisfy the sampled direction, so they are "compatible" with each other. Finally, an optimization stage refines the body to match the guiding hand while resolving penetrations with the object and/or receptacle. Note that our framework can generate both left- and right-hand grasps. Parts in purple are used for both training and inference, in green only for training, in brown only for inference, and in red for optimization.
  • Figure 5: Arm/hand direction (\ref{['sec:reachingField']}, Filter #1). Left: We cast rays from the object to surrounding space. Right: We prune rays intersecting with a receptacle and keep non-intersecting ones; the latter represent directions an arm/hand can reach the object from.
  • ...and 19 more figures