Table of Contents
Fetching ...

Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection

Aimira Baitieva, Yacine Bouaouni, Alexandre Briot, Dick Ameln, Souhaiel Khalfaoui, Samet Akcay

TL;DR

This work addresses a critical gap in visual industrial anomaly detection by showing that top-performing models on standard benchmarks (e.g., MVTec-AD) often fail in real production settings. It builds a comprehensive real-world benchmark across nine datasets, standardizes evaluation with fixed epochs and no center-cropping, and evaluates a diverse set of metrics, including industry-focused PG2 and PB2, to reflect deployment costs. By classifying models into one-class, unsupervised, and supervised categories and examining factors like input resolution, data drift, and label noise, the study demonstrates that production-readiness requires more than high AUROC on curated datasets and highlights reproducibility and dataset diversity as central challenges. The results yield concrete best-practice recommendations for datasets and evaluation protocols, emphasizing real-world data collection, robust validation, and transparent metric reporting to bridge academia and industry. Overall, the paper provides a pragmatic framework and actionable guidance to align research with production needs in visual industrial AD.

Abstract

Anomaly detection (AD) is essential for automating visual inspection in manufacturing. This field of computer vision is rapidly evolving, with increasing attention towards real-world applications. Meanwhile, popular datasets are typically produced in controlled lab environments with artificially created defects, unable to capture the diversity of real production conditions. New methods often fail in production settings, showing significant performance degradation or requiring impractical computational resources. This disconnect between academic results and industrial viability threatens to misdirect visual anomaly detection research. This paper makes three key contributions: (1) we demonstrate the importance of real-world datasets and establish benchmarks using actual production data, (2) we provide a fair comparison of existing SOTA methods across diverse tasks by utilizing metrics that are valuable for practical applications, and (3) we present a comprehensive analysis of recent advancements in this field by discussing important challenges and new perspectives for bridging the academia-industry gap. The code is publicly available at https://github.com/abc-125/viad-benchmark

Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection

TL;DR

This work addresses a critical gap in visual industrial anomaly detection by showing that top-performing models on standard benchmarks (e.g., MVTec-AD) often fail in real production settings. It builds a comprehensive real-world benchmark across nine datasets, standardizes evaluation with fixed epochs and no center-cropping, and evaluates a diverse set of metrics, including industry-focused PG2 and PB2, to reflect deployment costs. By classifying models into one-class, unsupervised, and supervised categories and examining factors like input resolution, data drift, and label noise, the study demonstrates that production-readiness requires more than high AUROC on curated datasets and highlights reproducibility and dataset diversity as central challenges. The results yield concrete best-practice recommendations for datasets and evaluation protocols, emphasizing real-world data collection, robust validation, and transparent metric reporting to bridge academia and industry. Overall, the paper provides a pragmatic framework and actionable guidance to align research with production needs in visual industrial AD.

Abstract

Anomaly detection (AD) is essential for automating visual inspection in manufacturing. This field of computer vision is rapidly evolving, with increasing attention towards real-world applications. Meanwhile, popular datasets are typically produced in controlled lab environments with artificially created defects, unable to capture the diversity of real production conditions. New methods often fail in production settings, showing significant performance degradation or requiring impractical computational resources. This disconnect between academic results and industrial viability threatens to misdirect visual anomaly detection research. This paper makes three key contributions: (1) we demonstrate the importance of real-world datasets and establish benchmarks using actual production data, (2) we provide a fair comparison of existing SOTA methods across diverse tasks by utilizing metrics that are valuable for practical applications, and (3) we present a comprehensive analysis of recent advancements in this field by discussing important challenges and new perspectives for bridging the academia-industry gap. The code is publicly available at https://github.com/abc-125/viad-benchmark

Paper Structure

This paper contains 36 sections, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Anomaly maps generated by two recent SOTA models, GLASS glass2024 and SimpleNet simplenet, overlayed over original images, demonstrate the reality gap in industrial anomaly detection. While the upper row illustrates the exceptional detections on MVTecAD dataset mvtec, the next two rows display the examples that the models fail on real-world datasets, BTAD btad and VAD vad. Image-level AUROC for the whole dataset is shown in the top right corner of the respective anomaly map. More results can be found in Tab. \ref{['tab:general_results']}.
  • Figure 2: Examples of some of the object categories used in this benchmark.
  • Figure 3: Anomaly maps generated by PatchCore and SimpleNet for Real-IAD, phone_battery and pcb for different input sizes. The defect is circled in red on the original image.
  • Figure 4: Pixel-level predictions generated by DRAEM for Real-IAD, woodstick, and SimpleNet for AeBAD, overlay over original images. All images contain no defects. Upper row, from left to right: no synthetic perturbations, random shadowing, color jitter, added noise. Lower row: no natural perturbations, change of camera angle, different lighting, different background.
  • Figure 5: Synthetic perturbations. The image on the left in each row shows the original data, and the images on the right show different augmentations produced using our pipeline.
  • ...and 1 more figures