Table of Contents
Fetching ...

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

Yushi Liu, Alexander Qualmann, Zehao Yu, Miroslav Gabriel, Philipp Schillinger, Markus Spies, Ngo Anh Vien, Andreas Geiger

TL;DR

This work tackles 6-DoF grasp detection for cluttered bin picking using a single top-down depth view. It introduces a probabilistic grasp distribution based on Power-Spherical distributions to model multiple grasp orientations per contact point and uncertainty, enabling training on diverse ground-truth grasps. A two-stage end-to-end network predicts dense, collision-free grasps and demonstrates superior performance in simulation and real-robot experiments, achieving around 90% object clearing and outperforming baselines. The approach shows strong robustness to noisy inputs and sim-to-real transfer, highlighting practical impact for industrial bin-picking tasks.

Abstract

Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground truth samples. Thereby, we also consider the grasp uncertainty enhancing the model's robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model's ability to generalize across various object categories achieving an object clearing rate of around $90 \%$ in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

TL;DR

This work tackles 6-DoF grasp detection for cluttered bin picking using a single top-down depth view. It introduces a probabilistic grasp distribution based on Power-Spherical distributions to model multiple grasp orientations per contact point and uncertainty, enabling training on diverse ground-truth grasps. A two-stage end-to-end network predicts dense, collision-free grasps and demonstrates superior performance in simulation and real-robot experiments, achieving around 90% object clearing and outperforming baselines. The approach shows strong robustness to noisy inputs and sim-to-real transfer, highlighting practical impact for industrial bin-picking tasks.

Abstract

Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground truth samples. Thereby, we also consider the grasp uncertainty enhancing the model's robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model's ability to generalize across various object categories achieving an object clearing rate of around in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.
Paper Structure (26 sections, 16 equations, 6 figures, 1 table)

This paper contains 26 sections, 16 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Schematic overview on training and inference pipelines of the proposed 6-DoF grasp prediction model. The network is trained to predict the optimal parameter $\theta$ of the grasp distribution $P_\theta(g | V)$, maximizing the likelihood of ground truth grasp samples; During inference, the model infers grasps with multiple orientations including graspability and collision scores. Green forks indicate 6-DoF grasps.
  • Figure 2: Contact grasp.
  • Figure 3: A baseline vector distribution (left), modeled as a PS distribution $PS(\nu_\mathbf{c}, \kappa_\mathbf{c})$; An example of the grasp configuration (bottom right), where red and green forks represent collision and collision-free grasps, respectively, at a contact point; The graphical model (top right) illustrating the conditional dependency between variables.
  • Figure 4: The two-stage network architecture.
  • Figure 5: Predicted grasps from GIGA (left), ours (middle), and our uncertainty prediction (right) on a $medium$ test scene.
  • ...and 1 more figures