Table of Contents
Fetching ...

Generative AI in Industrial Machine Vision -- A Review

Hans Aoyang Zhou, Dominik Wolfschläger, Constantinos Florides, Jonas Werheid, Hannes Behnen, Jan-Henrick Woltersmann, Tiago C. Pinto, Marco Kemmerling, Anas Abdelrazeq, Robert H. Schmitt

TL;DR

This PRISMA-based review addresses how Generative AI is being applied to industrial machine vision by cataloging architectures (primarily GANs and VAEs), data requirements, and task domains. The study analyzes over 1,200 papers (168 retained after screening) and finds data augmentation as the dominant GenAI use, with image enhancement and anomaly detection also prominent. It highlights practical transfer challenges, including data diversity, preprocessing needs, training instability, and inference speed, and stresses the importance of cross-domain evidence and domain similarity for successful industrial adoption. The work provides a consolidated view of current advancements, identifies gaps (notably limited diffusion/autoregressive usage and StyleGAN applications), and suggests directions to improve transferability and industrial impact of GenAI in machine vision.

Abstract

Machine vision enhances automation, quality control, and operational efficiency in industrial applications by enabling machines to interpret and act on visual data. While traditional computer vision algorithms and approaches remain widely utilized, machine learning has become pivotal in current research activities. In particular, generative AI demonstrates promising potential by improving pattern recognition capabilities, through data augmentation, increasing image resolution, and identifying anomalies for quality control. However, the application of generative AI in machine vision is still in its early stages due to challenges in data diversity, computational requirements, and the necessity for robust validation methods. A comprehensive literature review is essential to understand the current state of generative AI in industrial machine vision, focusing on recent advancements, applications, and research trends. Thus, a literature review based on the PRISMA guidelines was conducted, analyzing over 1,200 papers on generative AI in industrial machine vision. Our findings reveal various patterns in current research, with the primary use of generative AI being data augmentation, for machine vision tasks such as classification and object detection. Furthermore, we gather a collection of application challenges together with data requirements to enable a successful application of generative AI in industrial machine vision. This overview aims to provide researchers with insights into the different areas and applications within current research, highlighting significant advancements and identifying opportunities for future work.

Generative AI in Industrial Machine Vision -- A Review

TL;DR

This PRISMA-based review addresses how Generative AI is being applied to industrial machine vision by cataloging architectures (primarily GANs and VAEs), data requirements, and task domains. The study analyzes over 1,200 papers (168 retained after screening) and finds data augmentation as the dominant GenAI use, with image enhancement and anomaly detection also prominent. It highlights practical transfer challenges, including data diversity, preprocessing needs, training instability, and inference speed, and stresses the importance of cross-domain evidence and domain similarity for successful industrial adoption. The work provides a consolidated view of current advancements, identifies gaps (notably limited diffusion/autoregressive usage and StyleGAN applications), and suggests directions to improve transferability and industrial impact of GenAI in machine vision.

Abstract

Machine vision enhances automation, quality control, and operational efficiency in industrial applications by enabling machines to interpret and act on visual data. While traditional computer vision algorithms and approaches remain widely utilized, machine learning has become pivotal in current research activities. In particular, generative AI demonstrates promising potential by improving pattern recognition capabilities, through data augmentation, increasing image resolution, and identifying anomalies for quality control. However, the application of generative AI in machine vision is still in its early stages due to challenges in data diversity, computational requirements, and the necessity for robust validation methods. A comprehensive literature review is essential to understand the current state of generative AI in industrial machine vision, focusing on recent advancements, applications, and research trends. Thus, a literature review based on the PRISMA guidelines was conducted, analyzing over 1,200 papers on generative AI in industrial machine vision. Our findings reveal various patterns in current research, with the primary use of generative AI being data augmentation, for machine vision tasks such as classification and object detection. Furthermore, we gather a collection of application challenges together with data requirements to enable a successful application of generative AI in industrial machine vision. This overview aims to provide researchers with insights into the different areas and applications within current research, highlighting significant advancements and identifying opportunities for future work.
Paper Structure (22 sections, 7 equations, 7 figures, 7 tables)

This paper contains 22 sections, 7 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Taxonomy of GenAI approaches. The task of density estimation can be achieved through an explicit or implicit density estimation. Adapted from Foster.2023.
  • Figure 2: The VAE architecture is displayed with both encoder and decoder, where the encoder encodes the input data $x$ with dimensions $D$ into a representation of mean $\mu$ and standard deviation $\sigma$ values, resulting together with $\epsilon \sim \mathcal{N}\left( 0,1 \right)$ in the latent variable $z = \mu + \sigma \epsilon$. Afterward, the decoder $\mathcal{G}$ decodes the latent variable back into an image $\tilde{x} = \mathcal{G}_{\Theta} \left( z \right)$, with weights $\Theta$.
  • Figure 3: The GAN architecture is displayed with both generator $\mathcal{G}_\Theta$ and discriminator $\mathcal{D}_\Theta$, both parameterized with weights $\Theta$. During training, $\mathcal{G}_\Theta$ generates a fake image $\tilde{x} \in \mathbb{R}^D$ from a latent vector $z \in \mathbb{R}^d$. Afterward, $\tilde{x}$ and $x_{Real}$ are both used to train $\mathcal{D}_\Theta$, which tries to predict whether the image is from the real data distribution $p\left( x \right)$ or the fake data distribution $p_{\mathcal{G}}\left( x \right)$.
  • Figure 4: Simplified architecture of the StyleGAN model showing the mapping network $\mathcal{F}$ and the progressively growing synthesis network $\mathcal{G}$ with the different latent spaces.
  • Figure 5: PRISMA flowchart showing the number of publications excluded during study selection.
  • ...and 2 more figures