Table of Contents
Fetching ...

Physics-Aware Iterative Learning and Prediction of Saliency Map for Bimanual Grasp Planning

Shiyao Wang, Xiuping Liu, Charlie C. L. Wang, Jian Liu

TL;DR

The paper tackles the challenge of bimanual grasp planning by leveraging abundant single-handed grasp saliency data to predict physically plausible bimanual contact regions without requiring large-scale bimanual annotations. It introduces a physics-aware iterative learning pipeline comprising BSPN, BCPN, a physics-balance loss, and a physics-aware refinement that enforce balance and generalize to unseen objects, followed by grasp synthesis via ContactGrasp with MANO. Key contributions include the saliency corresponding vector framework, iterative updates of the single-handed saliency map, and a refinement module that enforces physical stability, achieving high bimanual grasp success in simulation (e.g., 92.5% across 80 shapes) and superior performance to single-handed baselines. This work reduces data requirements and enhances robustness of bimanual grasping for household objects, with practical impact on dexterous manipulation in unstructured environments.

Abstract

Learning the skill of human bimanual grasping can extend the capabilities of robotic systems when grasping large or heavy objects. However, it requires a much larger search space for grasp points than single-hand grasping and numerous bimanual grasping annotations for network learning, making both data-driven or analytical grasping methods inefficient and insufficient. We propose a framework for bimanual grasp saliency learning that aims to predict the contact points for bimanual grasping based on existing human single-handed grasping data. We learn saliency corresponding vectors through minimal bimanual contact annotations that establishes correspondences between grasp positions of both hands, capable of eliminating the need for training a large-scale bimanual grasp dataset. The existing single-handed grasp saliency value serves as the initial value for bimanual grasp saliency, and we learn a saliency adjusted score that adds the initial value to obtain the final bimanual grasp saliency value, capable of predicting preferred bimanual grasp positions from single-handed grasp saliency. We also introduce a physics-balance loss function and a physics-aware refinement module that enables physical grasp balance, capable of enhancing the generalization of unknown objects. Comprehensive experiments in simulation and comparisons on dexterous grippers have demonstrated that our method can achieve balanced bimanual grasping effectively.

Physics-Aware Iterative Learning and Prediction of Saliency Map for Bimanual Grasp Planning

TL;DR

The paper tackles the challenge of bimanual grasp planning by leveraging abundant single-handed grasp saliency data to predict physically plausible bimanual contact regions without requiring large-scale bimanual annotations. It introduces a physics-aware iterative learning pipeline comprising BSPN, BCPN, a physics-balance loss, and a physics-aware refinement that enforce balance and generalize to unseen objects, followed by grasp synthesis via ContactGrasp with MANO. Key contributions include the saliency corresponding vector framework, iterative updates of the single-handed saliency map, and a refinement module that enforces physical stability, achieving high bimanual grasp success in simulation (e.g., 92.5% across 80 shapes) and superior performance to single-handed baselines. This work reduces data requirements and enhances robustness of bimanual grasping for household objects, with practical impact on dexterous manipulation in unstructured environments.

Abstract

Learning the skill of human bimanual grasping can extend the capabilities of robotic systems when grasping large or heavy objects. However, it requires a much larger search space for grasp points than single-hand grasping and numerous bimanual grasping annotations for network learning, making both data-driven or analytical grasping methods inefficient and insufficient. We propose a framework for bimanual grasp saliency learning that aims to predict the contact points for bimanual grasping based on existing human single-handed grasping data. We learn saliency corresponding vectors through minimal bimanual contact annotations that establishes correspondences between grasp positions of both hands, capable of eliminating the need for training a large-scale bimanual grasp dataset. The existing single-handed grasp saliency value serves as the initial value for bimanual grasp saliency, and we learn a saliency adjusted score that adds the initial value to obtain the final bimanual grasp saliency value, capable of predicting preferred bimanual grasp positions from single-handed grasp saliency. We also introduce a physics-balance loss function and a physics-aware refinement module that enables physical grasp balance, capable of enhancing the generalization of unknown objects. Comprehensive experiments in simulation and comparisons on dexterous grippers have demonstrated that our method can achieve balanced bimanual grasping effectively.
Paper Structure (20 sections, 11 equations, 10 figures, 1 table)

This paper contains 20 sections, 11 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: The figure shows the average grasp coverage of 8 categories of objects and examples of single-handed grasp poses and bimanual grasp poses. The grasp coverage is defined as the percentage of users who labeled single-handed grasping positions are covered by bimanual grasping positions.
  • Figure 2: The overview of our physics-aware iterative learning and prediction of bimanual saliency map for novel objects. We first use a pre-trained SSPN model to compute the single-handed saliency map given an input point cloud. Next, the proposed BSPN network is leveraged to predict bimanual saliency map, which takes both point cloud and the single-handed saliency map, and the bimanual contact points are computed by applying BCPN network given both point cloud and the result of BSPN. After that, we refine bimanual saliency map to conform the physical stability using physics-aware refinement module.
  • Figure 3: The network architecture of BSPN. The encoder is consist of 3-layer MLP. And the decoder is consist of 4-layer MLP, which takes the feature concatenated by 64-dimensional and 1024-dimensional features as input. Finally, the decoder outputs $m-$dimensional prediction. For the saliency corresponding vector $V$ and the saliency adjusted score $\Delta S$, $m$ is $3$ and $1$ respectively. The total training loss function consists of three items: Correspondence Loss $L_c$, Adjustment Loss $L_a$ and Physics-aware Balance Loss $L_p$.
  • Figure 4: Ablation Study. We compare our method to some possible alternatives to demonstrate the rationality of our model and loss function. We show the predicted saliency maps with various baseline variants and evaluate their plausibility by BCACR metric.
  • Figure 5: Gallery of examples of grasp saliency maps and contact points predicted by our approach. The points colored by blue and red denote the predicted contact points of both hands, respectively.
  • ...and 5 more figures