Table of Contents
Fetching ...

Learning Diverse and Physically Feasible Dexterous Grasps with Generative Model and Bilevel Optimization

Albert Wu, Michelle Guo, C. Karen Liu

TL;DR

The paper tackles the problem of generating diverse, physically feasible dexterous grasps for novel objects. It proposes a hybrid pipeline that first predicts finger placements from object point clouds with a conditional variational autoencoder (CVAE) and then refines the prediction via a bilevel optimization (BO) that enforces wrench closure, friction cone, reachability, and collision constraints. The method is validated on real hardware using an Allegro hand mounted on a Panda arm, achieving 86.7% success across 20 household objects and demonstrating constraint satisfaction guarantees through quantitative metrics. The work demonstrates that integrating learning with physics-informed optimization yields robust, diverse grasp configurations with practical relevance for dexterous manipulation.

Abstract

To fully utilize the versatility of a multi-fingered dexterous robotic hand for executing diverse object grasps, one must consider the rich physical constraints introduced by hand-object interaction and object geometry. We propose an integrative approach of combining a generative model and a bilevel optimization (BO) to plan diverse grasp configurations on novel objects. First, a conditional variational autoencoder trained on merely six YCB objects predicts the finger placement directly from the object point cloud. The prediction is then used to seed a nonconvex BO that solves for a grasp configuration under collision, reachability, wrench closure, and friction constraints. Our method achieved an 86.7% success over 120 real world grasping trials on 20 household objects, including unseen and challenging geometries. Through quantitative empirical evaluations, we confirm that grasp configurations produced by our pipeline are indeed guaranteed to satisfy kinematic and dynamic constraints. A video summary of our results is available at youtu.be/9DTrImbN99I.

Learning Diverse and Physically Feasible Dexterous Grasps with Generative Model and Bilevel Optimization

TL;DR

The paper tackles the problem of generating diverse, physically feasible dexterous grasps for novel objects. It proposes a hybrid pipeline that first predicts finger placements from object point clouds with a conditional variational autoencoder (CVAE) and then refines the prediction via a bilevel optimization (BO) that enforces wrench closure, friction cone, reachability, and collision constraints. The method is validated on real hardware using an Allegro hand mounted on a Panda arm, achieving 86.7% success across 20 household objects and demonstrating constraint satisfaction guarantees through quantitative metrics. The work demonstrates that integrating learning with physics-informed optimization yields robust, diverse grasp configurations with practical relevance for dexterous manipulation.

Abstract

To fully utilize the versatility of a multi-fingered dexterous robotic hand for executing diverse object grasps, one must consider the rich physical constraints introduced by hand-object interaction and object geometry. We propose an integrative approach of combining a generative model and a bilevel optimization (BO) to plan diverse grasp configurations on novel objects. First, a conditional variational autoencoder trained on merely six YCB objects predicts the finger placement directly from the object point cloud. The prediction is then used to seed a nonconvex BO that solves for a grasp configuration under collision, reachability, wrench closure, and friction constraints. Our method achieved an 86.7% success over 120 real world grasping trials on 20 household objects, including unseen and challenging geometries. Through quantitative empirical evaluations, we confirm that grasp configurations produced by our pipeline are indeed guaranteed to satisfy kinematic and dynamic constraints. A video summary of our results is available at youtu.be/9DTrImbN99I.
Paper Structure (24 sections, 6 equations, 4 figures, 3 tables)

This paper contains 24 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Our method can pick up different objects shapes with a diverse set of grasp configurations.
  • Figure 2: Overview of our method. We train a CVAE that predicts finger placements $\mathcal{P}\in\mathbb{R}^{3\times 3}$ given an object point cloud $\bm{O}$. At inference time, we first obtain a finger placement prediction $\mathcal{P}$, which is not guaranteed to be physically feasible. Next, we compute a grasp configuration initial guess $\bm{q}'\in\mathbb{R}^{22}$ from $\mathcal{P}$. Finally, we apply BO to compute a physically feasible grasp $\bm{q}^*\in\mathbb{R}^{22}$.
  • Figure 3: \ref{['fig:bilevel:before']} and \ref{['fig:bilevel:after']}: Before and after BO, on a bottom-up view of a mustard bottle grasp. Using the color encoding of red: thumb, green: index, and blue: middle, we show $\mathcal{P}$ (solid spheres), $\mathcal{P}'$ (transparent spheres) and $\hat{\bm{n}}_i$ (colored lines). The direction of the computed contact forces $\bm{f}_i$ are shown at the respective fingertip with yellow lines. The mismatch between $K(\bm{q}')$ and $\mathcal{P}'$ is due to the numerical tolerance $\epsilon$. \ref{['fig:hardware_setup']}: Hardware setup. See Section \ref{['sec:hardware_setup']} for more details. \ref{['fig:sim_hw_correspondence']}: Grasp sim-to-real. Correspondence between the planned grasp in simulation and execution on hardware.
  • Figure 4: Diverse grasps generated by our method. Each image is a unique grasp generated from different sampled latent variables. All grasps were computed for the same initial object rest pose.

Theorems & Definitions (1)

  • Definition 1: Physically Feasible Dexterous Grasp