Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

Dongmyoung Lee; Wei Chen; Nicolas Rojas

Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

Dongmyoung Lee, Wei Chen, Nicolas Rojas

TL;DR

The paper tackles the data annotation bottleneck in multi-object grasping under clutter by introducing a hybrid synthetic–real training pipeline. It uses a WGAN-GP-based generator to create self-labeled fruit crops and composited scenes, enabling efficient instance segmentation with limited real data and translating to improved labelling and grasp success in real-world tasks. Results show that the Gen-hybrid approach outperforms real-only and CP-hybrid baselines, particularly when real data is scarce, and demonstrates robust grasping in clutter without relying on CAD models. This work points to a practical, data-efficient path for training perception systems for robotic manipulation in unstructured environments.

Abstract

Object recognition and object pose estimation in robotic grasping continue to be significant challenges, since building a labelled dataset can be time consuming and financially costly in terms of data collection and annotation. In this work, we propose a synthetic data generation method that minimizes human intervention and makes downstream image segmentation algorithms more robust by combining a generated synthetic dataset with a smaller real-world dataset (hybrid dataset). Annotation experiments show that the proposed synthetic scene generation can diminish labelling time dramatically. RGB image segmentation is trained with hybrid dataset and combined with depth information to produce pixel-to-point correspondence of individual segmented objects. The object to grasp is then determined by the confidence score of the segmentation algorithm. Pick-and-place experiments demonstrate that segmentation trained on our hybrid dataset (98.9%, 70%) outperforms the real dataset and a publicly available dataset by (6.7%, 18.8%) and (2.8%, 10%) in terms of labelling and grasping success rate, respectively. Supplementary material is available at https://sites.google.com/view/synthetic-dataset-generation.

Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

TL;DR

Abstract

Paper Structure (15 sections, 11 figures, 4 tables)

This paper contains 15 sections, 11 figures, 4 tables.

Introduction
Related Works
Object detection and segmentation algorithms
Multi-object manipulation
Methodology
Object-wise image generation method
Self-annotated synthetic scene production algorithm
Experiments
Annotation elapsed time
Instance segmentation performance
Real-world pick-and-place operation
Object localization
Target selection
Real-world demonstration
Conclusion

Figures (11)

Figure 1: Robot learns to grasp multiple objects in clutter and sort them into target boxes with the proposed instance segmentation algorithm.
Figure 2: The overall procedure for fruit grasping in clutter.
Figure 3: The network architecture of WGAN-GP algorithm.
Figure 4: Sample output of generated fruits using WGAN-GP algorithm.
Figure 5: Synthetic scene is generated by randomly pasting object-wise images into the background scenes. (A): Generated fruit images and segmented pixels representing the target object. (B): Synthetic scenes with these instances.
...and 6 more figures

Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

TL;DR

Abstract

Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

Authors

TL;DR

Abstract

Table of Contents

Figures (11)