Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Lukas Malte Kemeter; Rasmus Hvingelby; Paulina Sierak; Tobias Schön; Bishwajit Gosswam

Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Lukas Malte Kemeter, Rasmus Hvingelby, Paulina Sierak, Tobias Schön, Bishwajit Gosswam

TL;DR

It is argued that future research into the cost-efficiency of different training strategies is important for a better understanding of how to allocate budget in applied machine learning projects.

Abstract

In many manufacturing settings, annotating data for machine learning and computer vision is costly, but synthetic data can be generated at significantly lower cost. Substituting the real-world data with synthetic data is therefore appealing for many machine learning applications that require large amounts of training data. However, relying solely on synthetic data is frequently inadequate for effectively training models that perform well on real-world data, primarily due to domain shifts between the synthetic and real-world data. We discuss approaches for dealing with such a domain shift when detecting defects in X-ray scans of aluminium wheels. Using both simulated and real-world X-ray images, we train an object detection model with different strategies to identify the training approach that generates the best detection results while minimising the demand for annotated real-world training samples. Our preliminary findings suggest that the sim-2-real domain adaptation approach is more cost-efficient than a fully supervised oracle - if the total number of available annotated samples is fixed. Given a certain number of labeled real-world samples, training on a mix of synthetic and unlabeled real-world data achieved comparable or even better detection results at significantly lower cost. We argue that future research into the cost-efficiency of different training strategies is important for a better understanding of how to allocate budget in applied machine learning projects.

Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

TL;DR

It is argued that future research into the cost-efficiency of different training strategies is important for a better understanding of how to allocate budget in applied machine learning projects.

Abstract

Paper Structure (15 sections, 5 figures)

This paper contains 15 sections, 5 figures.

Abstract
Keywords: Domain Adaptation, Object Detection, Defect Detection, Semi-supervised Learning, Unsupervised-Learning, Cost-Efficiency
Introduction
Related Work
Setup
Data
Transferability from simulation to real-world
Methodology
Domain Adaptation Architecture
Training Approach for Supervised Learning, UDA, and SSDA
Preliminary results
Improving detection performance with fewer labels
Discussion
Conclusion
Acknowledgements

Figures (5)

Figure 1: A comparison between simulated (left picture) and real (right picture) X-ray projection from schoen. The shift between the source and target domain distribution is here illustrated by the grey value profile (red line) as well as by the slightly different position of the spokes on the images. The simulated images appear to have a very smooth profile, real world images appear to be more fuzzy.
Figure 2: Overview over experimental setup
Figure 3: Comparing supervised training with SSDA, UDA for different levels of annotated target data
Figure 4: Example of how the cost can vary depending on what is more expensive - acquiring data or labeling the data
Figure 5: Example of how performance can vary depending on the method chosen and the information available in the data

Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

TL;DR

Abstract

Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Authors

TL;DR

Abstract

Table of Contents

Figures (5)