Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

Kaixin Bai; Lei Zhang; Zhaopeng Chen; Fang Wan; Jianwei Zhang

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

Kaixin Bai, Lei Zhang, Zhaopeng Chen, Fang Wan, Jianwei Zhang

TL;DR

This paper tackles the data bottleneck in industrial robotic perception by introducing a physically-based structured-light synthesis pipeline that generates photorealistic RGBD data with ground-truth annotations. Using Blender Cycles and ray-traced gray-code projection, it produces realistic depth with structured-light noise, enabling effective sim2real transfer for object detection, instance segmentation, and robotic grasping. The authors demonstrate that depth-based inputs and domain-adaptation strategies reduce the sim2real gap, improving real-world performance and reducing development time compared to domain-randomized approaches. The work offers a practical, scalable tool for industrial DL deployment and points to future expansion of the dataset and optimization of pose estimation and grasping algorithms.

Abstract

Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization require extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both RGB and physically realistic depth images, surpassing previous dataset generation tools. We create an RGBD dataset tailored for robotic industrial grasping scenarios and evaluate it across various tasks, including object detection, instance segmentation, and embedding sim2real visual perception in industrial robotic grasping. By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings. Project details are available at https://baikaixinpublic.github.io/structured light 3D synthesizer/.

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 7 figures, 1 table)

This paper contains 20 sections, 2 equations, 7 figures, 1 table.

Introduction
Related Work
Synthetic Dataset Generation
Sim2real Gap
Physically-based Synthetic Depth Sensor Simulation
Method
Physically-Based Rendering
Pattern Projection
3D Reconstruction with Gray Code Pattern
Experiments
Object Database
Synthetic and Real-world Datasets
Object Detection and Instance Segmentation Experiments
Robotic grasping experiment
Results
...and 5 more sections

Figures (7)

Figure 1: Pipeline of physically-based sim2real transfer learning: We built a physically-based simulator with gravity to generate realistic data of cluttered scenarios. Then we use ray tracing to perform structured light projection, then decode and reconstruct the projected image to render physically-based realistic depth images with ground truth annotations for instance segmentation. Next, we train the vision perception using simulation only and perform sim2real transfer learning on the trained model to ensure good perception results in real-world scenarios. After that, we apply pose estimation to obtain the poses of objects and perform model-based robotic grasping.
Figure 2: The system architecture of the proposed pattern projection simulation and 3D reconstruction methods is as follows: First, we generate a scene using the proposed physically-based rendering technique. Then, we project various gray code patterns into the scene. These patterns serve as binary images for 3D reconstruction. The system includes a structured-light camera setup consisting of one projector and one camera. This setup captures the projected patterns and the scene. Finally, we estimate a synthetic depth image with realistic structured light noise using our proposed methods.
Figure 3: Object Database. a) Metal Workpiece 1. b) Metal Circle Workpiece 2. c)YellowSaltCube (KIT object models database). d) HygieneSpray (KIT object models database). e) Toothpaste (KIT object models database). f) The corresponding real objects.
Figure 4: Qualitative Visualization results of visual perception in real-world dataset using trained models of our synthetic datasets. The depth image-based deep learning model has better performance than the model with texture image as input.
Figure 5: The creation of a domain-randomized dataset using the Isaac Sim platform. The dataset generation features varied scene backgrounds, object numbers and poses, and lighting conditions in terms of intensity and color. Each parameter is randomized to increase the diversity of the dataset.
...and 2 more figures

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

TL;DR

Abstract

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)