Table of Contents
Fetching ...

IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing

Guoyang Xie, Jinbao Wang, Jiaqi Liu, Jiayi Lyu, Yong Liu, Chengjie Wang, Feng Zheng, Yaochu Jin

TL;DR

This work introduces IM-IAD, a comprehensive uniform benchmark for industrial image anomaly detection that encompasses seven datasets, 19 algorithms, and 17,017 total instances to evaluate IAD across varied supervision levels, learning paradigms, and efficiency constraints. It provides a plug-and-play, modular implementation, standardized metrics, and open-source code to enable fair comparison and reproducibility. Key findings reveal no universal winner across datasets, the critical role of global feature extraction for logical anomalies, and the value of fully supervised training, rotation augmentation for few-shot scenarios, and memory-bank approaches for continual learning. The benchmark aims to bridge the gap between academic research and industrial deployment by highlighting practical trade-offs and guiding future methodological directions for robust, efficient IAD in manufacturing settings.

Abstract

Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing (IM). Recently, many advanced algorithms have been reported, but their performance deviates considerably with various IM settings. We realize that the lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications. In addition, it is difficult for researchers to analyze IAD algorithms without a uniform benchmark. To solve this problem, we propose a uniform IM benchmark, for the first time, to assess how well these algorithms perform, which includes various levels of supervision (unsupervised versus fully supervised), learning paradigms (few-shot, continual and noisy label), and efficiency (memory usage and inference speed). Then, we construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets with a uniform setting. Extensive experiments (17,017 total) on IM-IAD provide in-depth insights into IAD algorithm redesign or selection. Moreover, the proposed IM-IAD benchmark challenges existing algorithms and suggests future research directions. To foster reproducibility and accessibility, the source code of IM-IAD is uploaded on the website, https://github.com/M-3LAB/IM-IAD.

IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing

TL;DR

This work introduces IM-IAD, a comprehensive uniform benchmark for industrial image anomaly detection that encompasses seven datasets, 19 algorithms, and 17,017 total instances to evaluate IAD across varied supervision levels, learning paradigms, and efficiency constraints. It provides a plug-and-play, modular implementation, standardized metrics, and open-source code to enable fair comparison and reproducibility. Key findings reveal no universal winner across datasets, the critical role of global feature extraction for logical anomalies, and the value of fully supervised training, rotation augmentation for few-shot scenarios, and memory-bank approaches for continual learning. The benchmark aims to bridge the gap between academic research and industrial deployment by highlighting practical trade-offs and guiding future methodological directions for robust, efficient IAD in manufacturing settings.

Abstract

Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing (IM). Recently, many advanced algorithms have been reported, but their performance deviates considerably with various IM settings. We realize that the lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications. In addition, it is difficult for researchers to analyze IAD algorithms without a uniform benchmark. To solve this problem, we propose a uniform IM benchmark, for the first time, to assess how well these algorithms perform, which includes various levels of supervision (unsupervised versus fully supervised), learning paradigms (few-shot, continual and noisy label), and efficiency (memory usage and inference speed). Then, we construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets with a uniform setting. Extensive experiments (17,017 total) on IM-IAD provide in-depth insights into IAD algorithm redesign or selection. Moreover, the proposed IM-IAD benchmark challenges existing algorithms and suggests future research directions. To foster reproducibility and accessibility, the source code of IM-IAD is uploaded on the website, https://github.com/M-3LAB/IM-IAD.
Paper Structure (20 sections, 3 equations, 4 figures, 13 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 4 figures, 13 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of the IM-IAD. The vanilla unsupervised IAD methods can be divided into two categories, namely feature embedding-based and reconstruction-based methods. (a) Feature embedding-based methods find the difference between the test samples and normal samples at the feature level, while (b) reconstruction-based methods compare the difference between the input image and the reconstructed image to determine whether it is abnormal or not. For fully supervised methods (c), they use limited abnormal samples with annotations to improve the model performance. The few-shot setting (d) uses a limited number of normal samples for training. The noisy setting (e) mixes abnormal samples in the training set and evaluates the robustness of the model. The continual setting (f) trains on each task in turn and evaluates how much the model forgets past tasks.
  • Figure 2: Visualization of vanilla IAD algorithms on Image AUC $\uparrow$, inference time and GPU memory under MVTec AD and LOCO-AD. The Y-axis denotes the performance of the IAD model. The X-axis refers to the inference time taken for each image. The size of the circle denotes the GPU memory consumption of the IAD model during the test phase, where the small one is better.
  • Figure 3: Visualization of the representative vanilla IAD algorithms. The three columns on the left (marked in red) show structural anomalies, while the three columns on the right (marked in blue) show logical anomalies. The first row indicates the training images, where all training images are normal. The second row denotes a test abnormal image and the third row reals the anomalies of the above abnormal image. Lastly, the fourth to sixth row presents the heat map of PatchCore, RD4AD and DRAEM, respectively.
  • Figure 4: Few-shot IAD Benchmark on MVTec AD and MPDD. The Y-axis refers to the metric value and the X-axis denotes the shot number.