Table of Contents
Fetching ...

Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation

Gaurav Singh, Sanket Kalwar, Md Faizal Karim, Bipasha Sen, Nagamanikandan Govindan, Srinath Sridhar, K Madhava Krishna

TL;DR

This work tackles constrained 6-DoF grasp generation on complex shapes for dual-arm manipulation by introducing CGDF, a diffusion-based model that uses part-guided diffusion and convolutional plane features to generate dense, region-specific grasps without requiring constraint-augmented training data. The method operates on the SE(3) manifold with an energy-based diffusion framework, employing Logmap/Expmap mappings and a neural energy decoder to score grasps, while guiding diffusion toward target regions via a max-energy formulation. Empirical results on the DA$^2$ dataset show CGDF outperforms state-of-the-art constrained and unconstrained baselines in Force Closure, Grasp Success Rate, and Target Grasps, validating its effectiveness for dual-arm planning and broader object geometries. The approach promises practical impact for robust, region-aware manipulation in real-world robotics by enabling sample-efficient constrained grasping without extensive labeled datasets.

Abstract

Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.

Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation

TL;DR

This work tackles constrained 6-DoF grasp generation on complex shapes for dual-arm manipulation by introducing CGDF, a diffusion-based model that uses part-guided diffusion and convolutional plane features to generate dense, region-specific grasps without requiring constraint-augmented training data. The method operates on the SE(3) manifold with an energy-based diffusion framework, employing Logmap/Expmap mappings and a neural energy decoder to score grasps, while guiding diffusion toward target regions via a max-energy formulation. Empirical results on the DA dataset show CGDF outperforms state-of-the-art constrained and unconstrained baselines in Force Closure, Grasp Success Rate, and Target Grasps, validating its effectiveness for dual-arm planning and broader object geometries. The approach promises practical impact for robust, region-aware manipulation in real-world robotics by enabling sample-efficient constrained grasping without extensive labeled datasets.

Abstract

Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.
Paper Structure (11 sections, 12 equations, 5 figures, 2 tables)

This paper contains 11 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: CGDF: Constrained Grasp Diffusion Fields generates dense grasps on large objects with complex shapes (like a chair), in a dual-arm setting. Given target regions (which can be generated from a text prompt using PartSLIP liu2023partslip), CGDF uses a Part-Guided Diffusion strategy to generate sample efficient grasps on the specified regions, enabling grasping for multiple regions for improved multi-arm grasping.
  • Figure 2: Overview: The figure section (a) shows the architecture of our proposed energy-based model $E_{\theta}$ as explained in \ref{['subsec:arch']} and \ref{['subsec:diffmodel']}. During this process, the model takes as input a point cloud and the grasp pose, which is subsequently converted into a set of query points. We use a VN-Pointnet-based point cloud encoder, which generates per-point features. (b) shows how these features are then distilled into three 2D feature planes oriented along the XY, XZ, and YZ planes using a convolutional multi-plane encoder. For each grasp pose, feature vectors corresponding to N query points are obtained using bilinear interpolation on the feature planes. Subsequently, the grasp feature vector is derived from $F_{\theta}$ and decoded into an energy value by $D_{\theta}$. In figure section (c), we show the grasp diffusion process, where grasps are diffused over the object during the forward diffusion process and denoised using the backward diffusion process.
  • Figure 3: Part-Guided Diffusion strategy: Using 2 instances of CGDF, conditioned on full pointcloud and the target region, we show how our part-guided diffusion works.
  • Figure 4: Qualitative results of PartSLIP to generate the constrained region based on given text prompts followed by CGDF to generate grasps on the proposed regions.
  • Figure 5: Qualitative comparison of unconstrained and constrained grasp generation in a dual-arm setting on CGDF and the baselines: VCGS vcgs and SE3Diffse3dif. All the baselines perform well for objects with simple shapes (like planar or elongated shapes). However, for relatively complex geometries like chairs, instruments, etc. CGDF generates dense, constrained and unconstrained grasps, whereas the baselines struggle to do so. The green grasps are non-colliding, and the red grasps are colliding. Zoom in for a better experience.