Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

Dániel Horváth; Kristóf Bocsi; Gábor Erdős; Zoltán Istenes

Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

Dániel Horváth, Kristóf Bocsi, Gábor Erdős, Zoltán Istenes

TL;DR

The paper tackles robust grasp pose estimation for adaptive robotics under non-ideal conditions by introducing two vision-based models, MOGPE Real-Time and MOGPE High-Precision, trained via a sim2real domain randomization pipeline. The approach decomposes grasp pose estimation into a two-stage pipeline: Stage 1 detects and classifies objects, while Stage 2 estimates orientation using per-class CNNs on ROI-cropped inputs, with a High-Precision variant adding a pattern-matching refinement. On real-world robotic grasping, the Real-Time model achieves 80% success and the High-Precision model reaches 96.67% success, complemented by high object-detection ($mAP_{50}$ ~ 98.8%) and orientation accuracy (>99% for most classes). The framework emphasizes industrial usability by enabling fast data generation, minimal domain-specific data, and real-time inference, contributing a practical sim2real toolkit for co-creative cyber-physical manufacturing systems.

Abstract

Adaptive robotics plays an essential role in achieving truly co-creative cyber physical systems. In robotic manipulation tasks, one of the biggest challenges is to estimate the pose of given workpieces. Even though the recent deep-learning-based models show promising results, they require an immense dataset for training. In this paper, two vision-based, multi-object grasp pose estimation models (MOGPE), the MOGPE Real-Time and the MOGPE High-Precision are proposed. Furthermore, a sim2real method based on domain randomization to diminish the reality gap and overcome the data shortage. Our methods yielded an 80% and a 96.67% success rate in a real-world robotic pick-and-place experiment, with the MOGPE Real-Time and the MOGPE High-Precision model respectively. Our framework provides an industrial tool for fast data generation and model training and requires minimal domain-specific data.

Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

TL;DR

~ 98.8%) and orientation accuracy (>99% for most classes). The framework emphasizes industrial usability by enabling fast data generation, minimal domain-specific data, and real-time inference, contributing a practical sim2real toolkit for co-creative cyber-physical manufacturing systems.

Abstract

Paper Structure (15 sections, 7 figures, 3 tables)

This paper contains 15 sections, 7 figures, 3 tables.

Introduction
Problem Statement
Related works
Approach
Object Detection (Stage 1)
ROI Cropping
Orientation Estimation (Stage 2)
Pattern Matching
Robot Control Architecture
Results
Setting of the Robotic Experiments
Object Detection
Orientation Estimation
Robotic Grasping
Conclusions and future work

Figures (7)

Figure 1: Illustration of our multi-object grasp pose estimation method.
Figure 2: The data flow of the ROI cropping method
Figure 3: The proposed CNN architecture for orientation estimation
Figure 4: Some examples of the generated synthetic training dataset
Figure 5: The robot control architecture. With blue color, the version of MOGPE RT model, while with orange color, the version of the MOGPE HP model.
...and 2 more figures

Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

TL;DR

Abstract

Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (7)