Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications
Dániel Horváth, Kristóf Bocsi, Gábor Erdős, Zoltán Istenes
TL;DR
The paper tackles robust grasp pose estimation for adaptive robotics under non-ideal conditions by introducing two vision-based models, MOGPE Real-Time and MOGPE High-Precision, trained via a sim2real domain randomization pipeline. The approach decomposes grasp pose estimation into a two-stage pipeline: Stage 1 detects and classifies objects, while Stage 2 estimates orientation using per-class CNNs on ROI-cropped inputs, with a High-Precision variant adding a pattern-matching refinement. On real-world robotic grasping, the Real-Time model achieves 80% success and the High-Precision model reaches 96.67% success, complemented by high object-detection ($mAP_{50}$ ~ 98.8%) and orientation accuracy (>99% for most classes). The framework emphasizes industrial usability by enabling fast data generation, minimal domain-specific data, and real-time inference, contributing a practical sim2real toolkit for co-creative cyber-physical manufacturing systems.
Abstract
Adaptive robotics plays an essential role in achieving truly co-creative cyber physical systems. In robotic manipulation tasks, one of the biggest challenges is to estimate the pose of given workpieces. Even though the recent deep-learning-based models show promising results, they require an immense dataset for training. In this paper, two vision-based, multi-object grasp pose estimation models (MOGPE), the MOGPE Real-Time and the MOGPE High-Precision are proposed. Furthermore, a sim2real method based on domain randomization to diminish the reality gap and overcome the data shortage. Our methods yielded an 80% and a 96.67% success rate in a real-world robotic pick-and-place experiment, with the MOGPE Real-Time and the MOGPE High-Precision model respectively. Our framework provides an industrial tool for fast data generation and model training and requires minimal domain-specific data.
