Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking
Yushi Liu, Alexander Qualmann, Zehao Yu, Miroslav Gabriel, Philipp Schillinger, Markus Spies, Ngo Anh Vien, Andreas Geiger
TL;DR
This work tackles 6-DoF grasp detection for cluttered bin picking using a single top-down depth view. It introduces a probabilistic grasp distribution based on Power-Spherical distributions to model multiple grasp orientations per contact point and uncertainty, enabling training on diverse ground-truth grasps. A two-stage end-to-end network predicts dense, collision-free grasps and demonstrates superior performance in simulation and real-robot experiments, achieving around 90% object clearing and outperforming baselines. The approach shows strong robustness to noisy inputs and sim-to-real transfer, highlighting practical impact for industrial bin-picking tasks.
Abstract
Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground truth samples. Thereby, we also consider the grasp uncertainty enhancing the model's robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model's ability to generalize across various object categories achieving an object clearing rate of around $90 \%$ in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.
