Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images

Thomas H. Schmitt; Maximilian Bundscherer; Tobias Bocklet

Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images

Thomas H. Schmitt, Maximilian Bundscherer, Tobias Bocklet

TL;DR

This work extends an AI application that automates the tracking of returned bread buns by creating an expanded dataset comprising 2432 images and a wider range of baked goods, and uses generative models pix2pix and CycleGAN to create synthetic images to increase model robustness.

Abstract

In the food industry, reprocessing returned product is a vital step to increase resource efficiency. [SBB23] presented an AI application that automates the tracking of returned bread buns. We extend their work by creating an expanded dataset comprising 2432 images and a wider range of baked goods. To increase model robustness, we use generative models pix2pix and CycleGAN to create synthetic images. We train state-of-the-art object detection model YOLOv9 and YOLOv8 on our detection task. Our overall best-performing model achieved an average precision AP@0.5 of 90.3% on our test set.

Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images

TL;DR

Abstract

Paper Structure (11 sections, 4 figures, 1 table)

This paper contains 11 sections, 4 figures, 1 table.

Introduction
Related Works
Data
Image Synthesis
Copy-Paste Augmentation
Generative Models
Experiments and Results
Experimental Setups
Experiments
Conclusions
Future Work

Figures (4)

Figure 1: Relative baked good type distributions in our training, and test set.
Figure 2: Left: Image of a Bauernbrot (farmer's bread). Right: Synthetic image generated by the Copy-Paste augmentation pipeline.
Figure 3: Left: Original image of an Apfeltasche (apple turnover). Middle: The corresponding synthetic image generated by a model trained on images with a drying tray background. Right: The synthetic image generated by a model trained on images without a background.
Figure 4: Left: Original image of a Baguettesemmel (baguette bun). Middle: The corresponding synthetic image generated by our trained pix2pix model. Right: The synthetic image generated by our trained CycleGAN model.

Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images

TL;DR

Abstract

Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images

Authors

TL;DR

Abstract

Table of Contents

Figures (4)