Table of Contents
Fetching ...

Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

Dániel Horváth, Kristóf Bocsi, Gábor Erdős, Zoltán Istenes

TL;DR

The paper tackles robust grasp pose estimation for adaptive robotics under non-ideal conditions by introducing two vision-based models, MOGPE Real-Time and MOGPE High-Precision, trained via a sim2real domain randomization pipeline. The approach decomposes grasp pose estimation into a two-stage pipeline: Stage 1 detects and classifies objects, while Stage 2 estimates orientation using per-class CNNs on ROI-cropped inputs, with a High-Precision variant adding a pattern-matching refinement. On real-world robotic grasping, the Real-Time model achieves 80% success and the High-Precision model reaches 96.67% success, complemented by high object-detection ($mAP_{50}$ ~ 98.8%) and orientation accuracy (>99% for most classes). The framework emphasizes industrial usability by enabling fast data generation, minimal domain-specific data, and real-time inference, contributing a practical sim2real toolkit for co-creative cyber-physical manufacturing systems.

Abstract

Adaptive robotics plays an essential role in achieving truly co-creative cyber physical systems. In robotic manipulation tasks, one of the biggest challenges is to estimate the pose of given workpieces. Even though the recent deep-learning-based models show promising results, they require an immense dataset for training. In this paper, two vision-based, multi-object grasp pose estimation models (MOGPE), the MOGPE Real-Time and the MOGPE High-Precision are proposed. Furthermore, a sim2real method based on domain randomization to diminish the reality gap and overcome the data shortage. Our methods yielded an 80% and a 96.67% success rate in a real-world robotic pick-and-place experiment, with the MOGPE Real-Time and the MOGPE High-Precision model respectively. Our framework provides an industrial tool for fast data generation and model training and requires minimal domain-specific data.

Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

TL;DR

The paper tackles robust grasp pose estimation for adaptive robotics under non-ideal conditions by introducing two vision-based models, MOGPE Real-Time and MOGPE High-Precision, trained via a sim2real domain randomization pipeline. The approach decomposes grasp pose estimation into a two-stage pipeline: Stage 1 detects and classifies objects, while Stage 2 estimates orientation using per-class CNNs on ROI-cropped inputs, with a High-Precision variant adding a pattern-matching refinement. On real-world robotic grasping, the Real-Time model achieves 80% success and the High-Precision model reaches 96.67% success, complemented by high object-detection ( ~ 98.8%) and orientation accuracy (>99% for most classes). The framework emphasizes industrial usability by enabling fast data generation, minimal domain-specific data, and real-time inference, contributing a practical sim2real toolkit for co-creative cyber-physical manufacturing systems.

Abstract

Adaptive robotics plays an essential role in achieving truly co-creative cyber physical systems. In robotic manipulation tasks, one of the biggest challenges is to estimate the pose of given workpieces. Even though the recent deep-learning-based models show promising results, they require an immense dataset for training. In this paper, two vision-based, multi-object grasp pose estimation models (MOGPE), the MOGPE Real-Time and the MOGPE High-Precision are proposed. Furthermore, a sim2real method based on domain randomization to diminish the reality gap and overcome the data shortage. Our methods yielded an 80% and a 96.67% success rate in a real-world robotic pick-and-place experiment, with the MOGPE Real-Time and the MOGPE High-Precision model respectively. Our framework provides an industrial tool for fast data generation and model training and requires minimal domain-specific data.
Paper Structure (15 sections, 7 figures, 3 tables)

This paper contains 15 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of our multi-object grasp pose estimation method.
  • Figure 2: The data flow of the ROI cropping method
  • Figure 3: The proposed CNN architecture for orientation estimation
  • Figure 4: Some examples of the generated synthetic training dataset
  • Figure 5: The robot control architecture. With blue color, the version of MOGPE RT model, while with orange color, the version of the MOGPE HP model.
  • ...and 2 more figures