Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

Yushi Liu; Alexander Qualmann; Zehao Yu; Miroslav Gabriel; Philipp Schillinger; Markus Spies; Ngo Anh Vien; Andreas Geiger

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

Yushi Liu, Alexander Qualmann, Zehao Yu, Miroslav Gabriel, Philipp Schillinger, Markus Spies, Ngo Anh Vien, Andreas Geiger

TL;DR

This work tackles 6-DoF grasp detection for cluttered bin picking using a single top-down depth view. It introduces a probabilistic grasp distribution based on Power-Spherical distributions to model multiple grasp orientations per contact point and uncertainty, enabling training on diverse ground-truth grasps. A two-stage end-to-end network predicts dense, collision-free grasps and demonstrates superior performance in simulation and real-robot experiments, achieving around 90% object clearing and outperforming baselines. The approach shows strong robustness to noisy inputs and sim-to-real transfer, highlighting practical impact for industrial bin-picking tasks.

Abstract

Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can only predict limited grasp orientations which leads to a reduced number of feasible grasps in bin picking with restricted reachability. In this paper, we propose a novel approach for learning dense and diverse 6-DoF grasps for parallel-jaw grippers in robotic bin picking. We introduce a parameterized grasp distribution model based on Power-Spherical distributions that enables a training based on all possible ground truth samples. Thereby, we also consider the grasp uncertainty enhancing the model's robustness to noisy inputs. As a result, given a single top-down view depth image, our model can generate diverse grasps with multiple collision-free grasp orientations. Experimental evaluations in simulation and on a real robotic bin picking setup demonstrate the model's ability to generalize across various object categories achieving an object clearing rate of around $90 \%$ in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

TL;DR

Abstract

in simulation and real-world experiments. We also outperform state of the art approaches. Moreover, the proposed approach exhibits its usability in real robot experiments without any refinement steps, even when only trained on a synthetic dataset, due to the probabilistic grasp distribution modeling.

Paper Structure (26 sections, 16 equations, 6 figures, 1 table)

This paper contains 26 sections, 16 equations, 6 figures, 1 table.

INTRODUCTION
Related Work
Data-driven 6-DoF Grasp Detection
Scene Representation
Grasp Pose Representation
Problem Statement
Notations
Supervised Learning
METHODS
Grasp Distribution Modeling
Contact Point Distribution
Local Grasp Distribution
Baseline Vector Distribution
Approach Vector Distribution
Two-stage Network
...and 11 more sections

Figures (6)

Figure 1: Schematic overview on training and inference pipelines of the proposed 6-DoF grasp prediction model. The network is trained to predict the optimal parameter $\theta$ of the grasp distribution $P_\theta(g | V)$, maximizing the likelihood of ground truth grasp samples; During inference, the model infers grasps with multiple orientations including graspability and collision scores. Green forks indicate 6-DoF grasps.
Figure 2: Contact grasp.
Figure 3: A baseline vector distribution (left), modeled as a PS distribution $PS(\nu_\mathbf{c}, \kappa_\mathbf{c})$; An example of the grasp configuration (bottom right), where red and green forks represent collision and collision-free grasps, respectively, at a contact point; The graphical model (top right) illustrating the conditional dependency between variables.
Figure 4: The two-stage network architecture.
Figure 5: Predicted grasps from GIGA (left), ours (middle), and our uncertainty prediction (right) on a $medium$ test scene.
...and 1 more figures

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

TL;DR

Abstract

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

Authors

TL;DR

Abstract

Table of Contents

Figures (6)