Table of Contents
Fetching ...

EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies

Kilian Batzner, Lars Heckler, Rebecca König

TL;DR

EfficientAD tackles real-time visual anomaly detection under strict latency constraints by combining a fast patch-based feature extractor (PDN) with a lightweight student–teacher detector and an autoencoder to capture logical constraints. It introduces a loss-driven asymmetry (hard feature loss) and a pretraining penalty to boost detection without increasing test-time cost, and it employs a calibrated fusion of local (student–teacher) and global (autoencoder) anomaly maps. Evaluated on 32 industrial datasets, EfficientAD achieves millisecond-scale latency and high throughput while delivering state-of-the-art detection and localization performance. This makes it a practical baseline for real-world manufacturing QA and a strong foundation for further research into efficient, explainable anomaly detection.

Abstract

Detecting anomalies in images is an important task, especially in real-time computer vision applications. In this work, we focus on computational efficiency and propose a lightweight feature extractor that processes an image in less than a millisecond on a modern GPU. We then use a student-teacher approach to detect anomalous features. We train a student network to predict the extracted features of normal, i.e., anomaly-free training images. The detection of anomalies at test time is enabled by the student failing to predict their features. We propose a training loss that hinders the student from imitating the teacher feature extractor beyond the normal images. It allows us to drastically reduce the computational cost of the student-teacher model, while improving the detection of anomalous features. We furthermore address the detection of challenging logical anomalies that involve invalid combinations of normal local features, for example, a wrong ordering of objects. We detect these anomalies by efficiently incorporating an autoencoder that analyzes images globally. We evaluate our method, called EfficientAD, on 32 datasets from three industrial anomaly detection dataset collections. EfficientAD sets new standards for both the detection and the localization of anomalies. At a latency of two milliseconds and a throughput of six hundred images per second, it enables a fast handling of anomalies. Together with its low error rate, this makes it an economical solution for real-world applications and a fruitful basis for future research.

EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies

TL;DR

EfficientAD tackles real-time visual anomaly detection under strict latency constraints by combining a fast patch-based feature extractor (PDN) with a lightweight student–teacher detector and an autoencoder to capture logical constraints. It introduces a loss-driven asymmetry (hard feature loss) and a pretraining penalty to boost detection without increasing test-time cost, and it employs a calibrated fusion of local (student–teacher) and global (autoencoder) anomaly maps. Evaluated on 32 industrial datasets, EfficientAD achieves millisecond-scale latency and high throughput while delivering state-of-the-art detection and localization performance. This makes it a practical baseline for real-world manufacturing QA and a strong foundation for further research into efficient, explainable anomaly detection.

Abstract

Detecting anomalies in images is an important task, especially in real-time computer vision applications. In this work, we focus on computational efficiency and propose a lightweight feature extractor that processes an image in less than a millisecond on a modern GPU. We then use a student-teacher approach to detect anomalous features. We train a student network to predict the extracted features of normal, i.e., anomaly-free training images. The detection of anomalies at test time is enabled by the student failing to predict their features. We propose a training loss that hinders the student from imitating the teacher feature extractor beyond the normal images. It allows us to drastically reduce the computational cost of the student-teacher model, while improving the detection of anomalous features. We furthermore address the detection of challenging logical anomalies that involve invalid combinations of normal local features, for example, a wrong ordering of objects. We detect these anomalies by efficiently incorporating an autoencoder that analyzes images globally. We evaluate our method, called EfficientAD, on 32 datasets from three industrial anomaly detection dataset collections. EfficientAD sets new standards for both the detection and the localization of anomalies. At a latency of two milliseconds and a throughput of six hundred images per second, it enables a fast handling of anomalies. Together with its low error rate, this makes it an economical solution for real-world applications and a fruitful basis for future research.
Paper Structure (36 sections, 11 figures, 17 tables, 3 algorithms)

This paper contains 36 sections, 11 figures, 17 tables, 3 algorithms.

Figures (11)

  • Figure 1: Anomaly detection performance vs. latency per image on an NVIDIA RTX A6000 GPU. Each AU-ROC value is an average of the image-level detection AU-ROC values on the MVTec AD bergmann2021_mvtec_ad_ijcvbergmann2019_mvtec_ad_cvpr, VisA zou2022spot, and MVTec LOCO bergmann2021_mvtec_loco_ijcv dataset collections.
  • Figure 2: Patch description network (PDN) architecture of EfficientAD-S. Applying it to an image in a fully convolutional manner yields all features in a single forward pass.
  • Figure 3: Upper row: absolute gradient of a single feature vector, located in the center of the output, with respect to each input pixel, averaged across input and output channels. Lower row: Average feature map of the first output channel across 1000 randomly chosen images from ImageNet russakovsky2015_alexnet. The mean of these images is shown on the left. The feature maps of the DenseNet huang2017densely and the WideResNet exhibit strong artifacts.
  • Figure 4: Randomly picked loss masks generated by the hard feature loss during training. The brightness of a mask pixel indicates how many of the dimensions of the respective feature vector were selected for backpropagation. The student network already mimics the teacher well on the background and thus focuses on learning the features of differently rotated screws.
  • Figure 5: EfficientAD applied to two test images from MVTec LOCO. Normal input images contain a horizontal cable connecting the two splicing connectors at an arbitrary height. The anomaly on the left is a foreign object in the form of a small metal washer at the end of the cable. It is visible in the local anomaly map because the outputs of the student and the teacher differ. The logical anomaly on the right is the presence of a second cable. The autoencoder fails to reconstruct the two cables on the right in the feature space of the teacher. The student also predicts the output of the autoencoder in addition to that of the teacher. Because its receptive field is restricted to small patches of the image, it is not influenced by the presence of the additional red cable. This causes the outputs of the autoencoder and the student to differ. "Diff" refers to computing the element-wise squared difference between two collections of output feature maps and computing its average across feature maps. To obtain pixel anomaly scores, the anomaly maps are resized to match the input image using bilinear interpolation.
  • ...and 6 more figures