Table of Contents
Fetching ...

Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection

Nico Baumgart, Markus Lange-Hegermann, Mike Mücke

TL;DR

This work targets the sim-to-real gap in industrial visual inspection by building an image synthesis pipeline that blends domain knowledge with domain randomization to generate 30,000 synthetic terminal-strip images plus 300 real images for evaluation. Four mainstream detectors (RetinaNet, Faster R-CNN, YOLOv8, and DINO) are trained solely on synthetic data and tested on both synthetic and real datasets, revealing a sizable gap when real images are unscaled. By applying scaling-based preprocessing and per-image optimization, the authors reduce the gap to under 2%, with DINO achieving a real-world $mAP@0.5$ of $98.40\%$, demonstrating the practicality of the approach. The study also provides a public dataset to benchmark sim-to-real methods in dense industrial object settings and outlines guidelines for transferring the pipeline to other industrial inspection tasks.

Abstract

In industrial manufacturing, deploying deep learning models for visual inspection is mostly hindered by the high and often intractable cost of collecting and annotating large-scale training datasets. While image synthesis from 3D CAD models is a common solution, the individual techniques of domain and rendering randomization to create rich synthetic training datasets have been well studied mainly in simple domains. Hence, their effectiveness on complex industrial tasks with densely arranged and similar objects remains unclear. In this paper, we investigate the sim-to-real generalization performance of standard object detectors on the complex industrial application of terminal strip object detection, carefully combining randomization and domain knowledge. We describe step-by-step the creation of our image synthesis pipeline that achieves high realism with minimal implementation effort and explain how this approach could be transferred to other industrial settings. Moreover, we created a dataset comprising 30.000 synthetic images and 300 manually annotated real images of terminal strips, which is publicly available for reference and future research. To provide a baseline as a lower bound of the expectable performance in these challenging industrial parts detection tasks, we show the sim-to-real generalization performance of standard object detectors on our dataset based on a fully synthetic training. While all considered models behave similarly, the transformer-based DINO model achieves the best score with 98.40 % mean average precision on the real test set, demonstrating that our pipeline enables high quality detections in complex industrial environments from existing CAD data and with a manageable image synthesis effort.

Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection

TL;DR

This work targets the sim-to-real gap in industrial visual inspection by building an image synthesis pipeline that blends domain knowledge with domain randomization to generate 30,000 synthetic terminal-strip images plus 300 real images for evaluation. Four mainstream detectors (RetinaNet, Faster R-CNN, YOLOv8, and DINO) are trained solely on synthetic data and tested on both synthetic and real datasets, revealing a sizable gap when real images are unscaled. By applying scaling-based preprocessing and per-image optimization, the authors reduce the gap to under 2%, with DINO achieving a real-world of , demonstrating the practicality of the approach. The study also provides a public dataset to benchmark sim-to-real methods in dense industrial object settings and outlines guidelines for transferring the pipeline to other industrial inspection tasks.

Abstract

In industrial manufacturing, deploying deep learning models for visual inspection is mostly hindered by the high and often intractable cost of collecting and annotating large-scale training datasets. While image synthesis from 3D CAD models is a common solution, the individual techniques of domain and rendering randomization to create rich synthetic training datasets have been well studied mainly in simple domains. Hence, their effectiveness on complex industrial tasks with densely arranged and similar objects remains unclear. In this paper, we investigate the sim-to-real generalization performance of standard object detectors on the complex industrial application of terminal strip object detection, carefully combining randomization and domain knowledge. We describe step-by-step the creation of our image synthesis pipeline that achieves high realism with minimal implementation effort and explain how this approach could be transferred to other industrial settings. Moreover, we created a dataset comprising 30.000 synthetic images and 300 manually annotated real images of terminal strips, which is publicly available for reference and future research. To provide a baseline as a lower bound of the expectable performance in these challenging industrial parts detection tasks, we show the sim-to-real generalization performance of standard object detectors on our dataset based on a fully synthetic training. While all considered models behave similarly, the transformer-based DINO model achieves the best score with 98.40 % mean average precision on the real test set, demonstrating that our pipeline enables high quality detections in complex industrial environments from existing CAD data and with a manageable image synthesis effort.
Paper Structure (16 sections, 9 figures, 2 tables)

This paper contains 16 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Example rendering of a terminal strip randomly generated by the process described in Subsection \ref{['subsec:creation']}. It shows the centered front view that is most informative in this use case.
  • Figure 2: Visualization of 1000 sampled camera positions in the form of a point cloud to illustrate the procedure of randomizing viewpoints.
  • Figure 3: Example $360^{\circ}$ HDRI from Poly Haven used for the image-based lighting.
  • Figure 4: Illustration of the semi-ellipsoid used to visualize the shadows of the terminal strips and onto which the HDRIs are mapped to create the image background.
  • Figure 5: Four different annotation scenarios that illustrate the difficulty of determining accurate bounding boxes for narrow objects, even in image synthesis.
  • ...and 4 more figures