Table of Contents
Fetching ...

Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance

Ruo-Syuan Mei, Sixian Jia, Guangze Li, Soo Yeon Lee, Brian Musser, William Keller, Sreten Zakula, Jorge Arinez, Chenhui Shao

TL;DR

The paper tackles the data bottleneck and extreme class imbalance in vision-based industrial part inspection by proposing a hybrid synthetic data generation framework that blends simulation-based rendering, domain randomization, and real-background compositing to enable zero-shot learning. A two-stage architecture using YOLOv8n for detection and MobileNetV3-small for binary quality classification is trained solely on synthetic data and validated on 300 real parts, achieving mAP@0.5 of 0.995, 96.0% accuracy, and 90.1% balanced accuracy. Compared with few-shot real-data baselines, the SDG approach yields up to 23.3% higher balanced accuracy under both balanced and imbalanced conditions. This annotation-free, scalable method reduces labeling costs and enables rapid deployment of robust inspection systems across new products.

Abstract

Machine learning, particularly deep learning, is transforming industrial quality inspection. Yet, training robust machine learning models typically requires large volumes of high-quality labeled data, which are expensive, time-consuming, and labor-intensive to obtain in manufacturing. Moreover, defective samples are intrinsically rare, leading to severe class imbalance that degrades model performance. These data constraints hinder the widespread adoption of machine learning-based quality inspection methods in real production environments. Synthetic data generation (SDG) offers a promising solution by enabling the creation of large, balanced, and fully annotated datasets in an efficient, cost-effective, and scalable manner. This paper presents a hybrid SDG framework that integrates simulation-based rendering, domain randomization, and real background compositing to enable zero-shot learning for computer vision-based industrial part inspection without manual annotation. The SDG pipeline generates 12,960 labeled images in one hour by varying part geometry, lighting, and surface properties, and then compositing synthetic parts onto real image backgrounds. A two-stage architecture utilizing a YOLOv8n backbone for object detection and MobileNetV3-small for quality classification is trained exclusively on synthetic data and evaluated on 300 real industrial parts. The proposed approach achieves an mAP@0.5 of 0.995 for detection, 96% classification accuracy, and 90.1% balanced accuracy. Comparative evaluation against few-shot real-data baseline approaches demonstrates significant improvement. The proposed SDG-based approach achieves 90-91% balanced accuracy under severe class imbalance, while the baselines reach only 50% accuracy. These results demonstrate that the proposed method enables annotation-free, scalable, and robust quality inspection for real-world manufacturing applications.

Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance

TL;DR

The paper tackles the data bottleneck and extreme class imbalance in vision-based industrial part inspection by proposing a hybrid synthetic data generation framework that blends simulation-based rendering, domain randomization, and real-background compositing to enable zero-shot learning. A two-stage architecture using YOLOv8n for detection and MobileNetV3-small for binary quality classification is trained solely on synthetic data and validated on 300 real parts, achieving mAP@0.5 of 0.995, 96.0% accuracy, and 90.1% balanced accuracy. Compared with few-shot real-data baselines, the SDG approach yields up to 23.3% higher balanced accuracy under both balanced and imbalanced conditions. This annotation-free, scalable method reduces labeling costs and enables rapid deployment of robust inspection systems across new products.

Abstract

Machine learning, particularly deep learning, is transforming industrial quality inspection. Yet, training robust machine learning models typically requires large volumes of high-quality labeled data, which are expensive, time-consuming, and labor-intensive to obtain in manufacturing. Moreover, defective samples are intrinsically rare, leading to severe class imbalance that degrades model performance. These data constraints hinder the widespread adoption of machine learning-based quality inspection methods in real production environments. Synthetic data generation (SDG) offers a promising solution by enabling the creation of large, balanced, and fully annotated datasets in an efficient, cost-effective, and scalable manner. This paper presents a hybrid SDG framework that integrates simulation-based rendering, domain randomization, and real background compositing to enable zero-shot learning for computer vision-based industrial part inspection without manual annotation. The SDG pipeline generates 12,960 labeled images in one hour by varying part geometry, lighting, and surface properties, and then compositing synthetic parts onto real image backgrounds. A two-stage architecture utilizing a YOLOv8n backbone for object detection and MobileNetV3-small for quality classification is trained exclusively on synthetic data and evaluated on 300 real industrial parts. The proposed approach achieves an mAP@0.5 of 0.995 for detection, 96% classification accuracy, and 90.1% balanced accuracy. Comparative evaluation against few-shot real-data baseline approaches demonstrates significant improvement. The proposed SDG-based approach achieves 90-91% balanced accuracy under severe class imbalance, while the baselines reach only 50% accuracy. These results demonstrate that the proposed method enables annotation-free, scalable, and robust quality inspection for real-world manufacturing applications.

Paper Structure

This paper contains 20 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Proposed hybrid SDG pipeline for part quality inspection
  • Figure 2: Illustration of 3D simulation engine.
  • Figure 3: Domain randomization.
  • Figure 4: Background library.
  • Figure 5: Examples of synthetic part images.
  • ...and 6 more figures