Table of Contents
Fetching ...

Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands

Yunshuang Li, Yiyang Ling, Gaurav S. Sukhatme, Daniel Seita

TL;DR

Nonprehensile manipulation with dexterous hands is challenging due to complex hand-object interactions. This paper presents Geometry-aware Dexterous Pushing and Pulling (GD2P), a scalable pipeline that learns pre-contact hand poses conditioned on object geometry via a diffusion model and validates them through physics simulation and arm-hand planning. A large 1.3-million-pose dataset over 2.3k objects, combined with energy-based pose optimization and geometry-conditioned diffusion, enables robust pushing and pulling across diverse shapes and directions, including multi-step tasks and cross-hand morphologies. GD2P demonstrates real-world feasibility on Allegro and LEAP hands, outperforms baselines, and provides open-source data and code to accelerate research in dexterous nonprehensile manipulation.

Abstract

Nonprehensile manipulation, such as pushing and pulling, enables robots to move, align, or reposition objects that may be difficult to grasp due to their geometry, size, or relationship to the robot or the environment. Much of the existing work in nonprehensile manipulation relies on parallel-jaw grippers or tools such as rods and spatulas. In contrast, multi-fingered dexterous hands offer richer contact modes and versatility for handling diverse objects to provide stable support over the objects, which compensates for the difficulty of modeling the dynamics of nonprehensile manipulation. Therefore, we propose Geometry-aware Dexterous Pushing and Pulling (GD2P) for nonprehensile manipulation with dexterous robotic hands. We study pushing and pulling by framing the problem as synthesizing and learning pre-contact dexterous hand poses that lead to effective manipulation. We generate diverse hand poses via contact-guided sampling, filter them using physics simulation, and train a diffusion model conditioned on object geometry to predict viable poses. At test time, we sample hand poses and use standard motion planners to select and execute pushing and pulling actions. We perform 840 real-world experiments with an Allegro Hand, comparing our method to baselines. The results indicate that GD2P offers a scalable route for training dexterous nonprehensile manipulation policies. We further demonstrate GD2P on a LEAP Hand, highlighting its applicability to different hand morphologies. Our pre-trained models and dataset, including 1.3 million hand poses across 2.3k objects, will be open-source to facilitate further research. Our project website is available at: geodex2p.github.io.

Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands

TL;DR

Nonprehensile manipulation with dexterous hands is challenging due to complex hand-object interactions. This paper presents Geometry-aware Dexterous Pushing and Pulling (GD2P), a scalable pipeline that learns pre-contact hand poses conditioned on object geometry via a diffusion model and validates them through physics simulation and arm-hand planning. A large 1.3-million-pose dataset over 2.3k objects, combined with energy-based pose optimization and geometry-conditioned diffusion, enables robust pushing and pulling across diverse shapes and directions, including multi-step tasks and cross-hand morphologies. GD2P demonstrates real-world feasibility on Allegro and LEAP hands, outperforms baselines, and provides open-source data and code to accelerate research in dexterous nonprehensile manipulation.

Abstract

Nonprehensile manipulation, such as pushing and pulling, enables robots to move, align, or reposition objects that may be difficult to grasp due to their geometry, size, or relationship to the robot or the environment. Much of the existing work in nonprehensile manipulation relies on parallel-jaw grippers or tools such as rods and spatulas. In contrast, multi-fingered dexterous hands offer richer contact modes and versatility for handling diverse objects to provide stable support over the objects, which compensates for the difficulty of modeling the dynamics of nonprehensile manipulation. Therefore, we propose Geometry-aware Dexterous Pushing and Pulling (GD2P) for nonprehensile manipulation with dexterous robotic hands. We study pushing and pulling by framing the problem as synthesizing and learning pre-contact dexterous hand poses that lead to effective manipulation. We generate diverse hand poses via contact-guided sampling, filter them using physics simulation, and train a diffusion model conditioned on object geometry to predict viable poses. At test time, we sample hand poses and use standard motion planners to select and execute pushing and pulling actions. We perform 840 real-world experiments with an Allegro Hand, comparing our method to baselines. The results indicate that GD2P offers a scalable route for training dexterous nonprehensile manipulation policies. We further demonstrate GD2P on a LEAP Hand, highlighting its applicability to different hand morphologies. Our pre-trained models and dataset, including 1.3 million hand poses across 2.3k objects, will be open-source to facilitate further research. Our project website is available at: geodex2p.github.io.

Paper Structure

This paper contains 23 sections, 3 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Three examples of nonprehensile manipulation using GD2P with a 4-finger, 16-DOF Allegro Hand. The top row shows the starting object configuration with its goal rendered as a transparent overlay, while the bottom row shows the result after the robot's motion. GD2P synthesizes diverse hand poses conditioned on object geometry, handling flat (left), volumetric (middle), and tall (right) objects. Grey arrows represent the transporting direction, whereas white volumetric dots mark the estimated fingertip contact with the object.
  • Figure 2: Overview of GD2P. We present a large-scale dataset of hand poses specifically for pushing or pulling, and leverage it to train a diffusion model. During execution time, given an object, we obtain its basis point set representation prokudin2019BPS and pass that to our trained diffusion model, which uses the architecture from weng2024dexdiffuser. This model synthesizes diverse floating pre-contact hand poses formed from our large-scale data generation pipeline (Sec. \ref{['ssec:hand_poses']}). Given these hand poses, we then check their feasibility in a physics simulator by adding the arm back in and performing motion planning sundaralingam2023curobo. We rank the feasible hand poses (e.g., "C" is infeasible in the example here since the motion planner detects an unavoidable collision between the arm and the table.) and select the best performing one (e.g., "D" in our example with no collision detected, while optimally facilitating the pushing direction.) and execute it in the real world.
  • Figure 3: Examples of pushing and pulling hand poses from optimizing our energy function (Eq. \ref{['eq:energy']}). These have all been validated in IsaacGym simulation. In all examples, the intended object pushing direction is to the right. These data points are used to train our diffusion model (see Sec. \ref{['ssec:diffusion_model']}).
  • Figure 4: Visualization of $L_{\rm goal}, L_{\rm coll}$, and $L_{\rm dir}$ values in $V(H)$ from Eq. \ref{['eq:eval_metric']} on three simulated hand poses. See Sec. \ref{['ssec:motion_planning']} and Sec. \ref{['ssec:real_results']} for more details.
  • Figure 5: The objects we use in our real-world experiments, including 3D printed and common ("Daily") objects. See Sec. \ref{['ssec:real_experiments']} for more details.
  • ...and 14 more figures