Table of Contents
Fetching ...

Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset

Xiaomeng Zhu, Talha Bilal, Pär Mårtensson, Lars Hanson, Mårten Björkman, Atsuto Maki

TL;DR

The paper tackles sim-to-real industrial parts classification by introducing SIP-17, a synthetic dataset of 17 industrial objects across six use cases that includes isolated and assembled parts. It leverages domain randomization to create two synthetic training sets, Syn_R with randomized backgrounds and post-processing and Syn_O without, and evaluates five state-of-the-art classifiers trained solely on synthetic data against real-world test images. Results show that Syn_R generally yields better cross-domain performance, with ConvNext achieving the top overall accuracy on isolated parts, while assembled-part tasks remain challenging due to albedo and shape similarities. The work demonstrates the potential of synthetic data for industrial applications and positions SIP-17 as a benchmark to guide future research and dataset expansion in sim-to-real classification of industrial parts.

Abstract

This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and assembled parts. A few subsets of objects exhibit large similarities in shape and albedo for reflecting challenging cases of industrial parts. All the sample images come with and without random backgrounds and post-processing for evaluating the importance of domain randomization. We call it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of SIP-17 through benchmarking the performance of five state-of-the-art deep network models, supervised and self-supervised, trained only on the synthetic data while testing them on real data. By analyzing the results, we deduce some insights on the feasibility and challenges of using synthetic data for industrial parts classification and for further developing larger-scale synthetic datasets. Our dataset and code are publicly available.

Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset

TL;DR

The paper tackles sim-to-real industrial parts classification by introducing SIP-17, a synthetic dataset of 17 industrial objects across six use cases that includes isolated and assembled parts. It leverages domain randomization to create two synthetic training sets, Syn_R with randomized backgrounds and post-processing and Syn_O without, and evaluates five state-of-the-art classifiers trained solely on synthetic data against real-world test images. Results show that Syn_R generally yields better cross-domain performance, with ConvNext achieving the top overall accuracy on isolated parts, while assembled-part tasks remain challenging due to albedo and shape similarities. The work demonstrates the potential of synthetic data for industrial applications and positions SIP-17 as a benchmark to guide future research and dataset expansion in sim-to-real classification of industrial parts.

Abstract

This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and assembled parts. A few subsets of objects exhibit large similarities in shape and albedo for reflecting challenging cases of industrial parts. All the sample images come with and without random backgrounds and post-processing for evaluating the importance of domain randomization. We call it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of SIP-17 through benchmarking the performance of five state-of-the-art deep network models, supervised and self-supervised, trained only on the synthetic data while testing them on real data. By analyzing the results, we deduce some insights on the feasibility and challenges of using synthetic data for industrial parts classification and for further developing larger-scale synthetic datasets. Our dataset and code are publicly available.
Paper Structure (12 sections, 6 figures, 1 table)

This paper contains 12 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Sample images from the SIP-17 dataset, showcasing three categories: Syn_O, synthetic images without random backgrounds and post-processing; Syn_R, synthetic images with random backgrounds and post-processing; and Real, images captured from cameras in real industrial scenarios. Use cases 1-4 require the classification of isolated industrial parts, while use cases 5 and 6 require the classification of assembled parts.
  • Figure 2: Results of all isolated parts.
  • Figure 3: Results of all use cases.
  • Figure 4: Confusion matrices on different use cases with the ConvNext model.
  • Figure 5: Class-wise accuracy (%) of use cases 1 to 4.Purple and blue colors indicate the best and second-best models in terms of total accuracy trained with Syn_R.
  • ...and 1 more figures