Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection
Aimira Baitieva, Yacine Bouaouni, Alexandre Briot, Dick Ameln, Souhaiel Khalfaoui, Samet Akcay
TL;DR
This work addresses a critical gap in visual industrial anomaly detection by showing that top-performing models on standard benchmarks (e.g., MVTec-AD) often fail in real production settings. It builds a comprehensive real-world benchmark across nine datasets, standardizes evaluation with fixed epochs and no center-cropping, and evaluates a diverse set of metrics, including industry-focused PG2 and PB2, to reflect deployment costs. By classifying models into one-class, unsupervised, and supervised categories and examining factors like input resolution, data drift, and label noise, the study demonstrates that production-readiness requires more than high AUROC on curated datasets and highlights reproducibility and dataset diversity as central challenges. The results yield concrete best-practice recommendations for datasets and evaluation protocols, emphasizing real-world data collection, robust validation, and transparent metric reporting to bridge academia and industry. Overall, the paper provides a pragmatic framework and actionable guidance to align research with production needs in visual industrial AD.
Abstract
Anomaly detection (AD) is essential for automating visual inspection in manufacturing. This field of computer vision is rapidly evolving, with increasing attention towards real-world applications. Meanwhile, popular datasets are typically produced in controlled lab environments with artificially created defects, unable to capture the diversity of real production conditions. New methods often fail in production settings, showing significant performance degradation or requiring impractical computational resources. This disconnect between academic results and industrial viability threatens to misdirect visual anomaly detection research. This paper makes three key contributions: (1) we demonstrate the importance of real-world datasets and establish benchmarks using actual production data, (2) we provide a fair comparison of existing SOTA methods across diverse tasks by utilizing metrics that are valuable for practical applications, and (3) we present a comprehensive analysis of recent advancements in this field by discussing important challenges and new perspectives for bridging the academia-industry gap. The code is publicly available at https://github.com/abc-125/viad-benchmark
