Table of Contents
Fetching ...

The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection

Lars Heckler-Kram, Jan-Hendrik Neudeck, Ulla Scheler, Rebecca König, Carsten Steger

TL;DR

MVTec AD 2 introduces eight advanced 2D anomaly-detection scenarios totaling 8,004 high-resolution images to address saturation in existing benchmarks and to enable robust testing under real-world lighting distribution shifts. The dataset provides defect-free training, diverse test conditions including unseen lighting, and pixel-precise ground truth with a public evaluation server to ensure fair comparisons. A benchmark of seven state-of-the-art methods shows significant room for improvement, with threshold-independent AU-PRO$_{0.05}$ scores generally below 30-40% and even lower when considering boundary and small defects, despite occasional gains at larger image sizes. The work emphasizes the importance of robustness to distribution shifts and efficiency, offering a standardized, transparent platform for fair performance assessment and future methodological advances in unsupervised industrial anomaly detection.

Abstract

In recent years, performance on existing anomaly detection benchmarks like MVTec AD and VisA has started to saturate in terms of segmentation AU-PRO, with state-of-the-art models often competing in the range of less than one percentage point. This lack of discriminatory power prevents a meaningful comparison of models and thus hinders progress of the field, especially when considering the inherent stochastic nature of machine learning results. We present MVTec AD 2, a collection of eight anomaly detection scenarios with more than 8000 high-resolution images. It comprises challenging and highly relevant industrial inspection use cases that have not been considered in previous datasets, including transparent and overlapping objects, dark-field and back light illumination, objects with high variance in the normal data, and extremely small defects. We provide comprehensive evaluations of state-of-the-art methods and show that their performance remains below 60% average AU-PRO. Additionally, our dataset provides test scenarios with lighting condition changes to assess the robustness of methods under real-world distribution shifts. We host a publicly accessible evaluation server that holds the pixel-precise ground truth of the test set (https://benchmark.mvtec.com/). All image data is available at https://www.mvtec.com/company/research/datasets/mvtec-ad-2.

The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection

TL;DR

MVTec AD 2 introduces eight advanced 2D anomaly-detection scenarios totaling 8,004 high-resolution images to address saturation in existing benchmarks and to enable robust testing under real-world lighting distribution shifts. The dataset provides defect-free training, diverse test conditions including unseen lighting, and pixel-precise ground truth with a public evaluation server to ensure fair comparisons. A benchmark of seven state-of-the-art methods shows significant room for improvement, with threshold-independent AU-PRO scores generally below 30-40% and even lower when considering boundary and small defects, despite occasional gains at larger image sizes. The work emphasizes the importance of robustness to distribution shifts and efficiency, offering a standardized, transparent platform for fair performance assessment and future methodological advances in unsupervised industrial anomaly detection.

Abstract

In recent years, performance on existing anomaly detection benchmarks like MVTec AD and VisA has started to saturate in terms of segmentation AU-PRO, with state-of-the-art models often competing in the range of less than one percentage point. This lack of discriminatory power prevents a meaningful comparison of models and thus hinders progress of the field, especially when considering the inherent stochastic nature of machine learning results. We present MVTec AD 2, a collection of eight anomaly detection scenarios with more than 8000 high-resolution images. It comprises challenging and highly relevant industrial inspection use cases that have not been considered in previous datasets, including transparent and overlapping objects, dark-field and back light illumination, objects with high variance in the normal data, and extremely small defects. We provide comprehensive evaluations of state-of-the-art methods and show that their performance remains below 60% average AU-PRO. Additionally, our dataset provides test scenarios with lighting condition changes to assess the robustness of methods under real-world distribution shifts. We host a publicly accessible evaluation server that holds the pixel-precise ground truth of the test set (https://benchmark.mvtec.com/). All image data is available at https://www.mvtec.com/company/research/datasets/mvtec-ad-2.

Paper Structure

This paper contains 28 sections, 1 equation, 22 figures, 12 tables.

Figures (22)

  • Figure 1: The MVTec AD 2 objects. For each object, one defect-free image and one image with anomalies, outlined in red, is shown. The close-up of the anomaly region shows the pixel-precise ground truth labels.
  • Figure 2: Defect distribution across the MVTec AD, VisA, and MVTec AD 2 datasets. The plot shows the normalized ground truth label distribution for each dataset. In MVTec AD and VisA, defect labels are predominantly concentrated at the center of the images, whereas MVTec AD 2 exhibits a significant number of defects at the image borders. This enables to test methods for robustness against boundary artifacts.
  • Figure 3: Lighting condition changes contained in MVTec AD 2 for several example objects. In addition to the regular exposure, each scene was captured under minor over- and underexposure. Moreover, additional light sources evoke object-specific variations in appearance such as reflections (Vial), uneven illumination (Wall Plugs), or slight changes in color temperature (Rice, Walnuts).
  • Figure 4: Per region overlap (PRO) vs. true positive rate (TPR) for a given ground truth of anomalous data and different cases of thresholded predictions. PRO considers each anomalous region equally, whereas the TPR is dominated by large defects and overestimates defect localization quality.
  • Figure 5: Reducing the integration limit of AU-PRO to foster more meaningful anomaly maps. An exemplary anomaly map of MSFlow on the object Rice for an image size of $\textsf{256}\times\textsf{256}$ is shown and thresholded to obtain the desired false positive rate (FPR). The common integration limit $\textsf{FPR} = \textsf{0.3}$ allows segmented defects that are drastically too large.
  • ...and 17 more figures