Table of Contents
Fetching ...

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features

Luc P. J. Sträter, Mohammadreza Salehi, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano

TL;DR

GeneralAD addresses the cross-domain anomaly detection gap by combining patch-based Vision Transformer features with a self-supervised anomaly feature generation module and a cross-patch attention discriminator. The SAG module creates challenging pseudo-abnormal features, enabling robust detection and interpretable localization from patch-level to image-level across semantic, near-distribution, and industrial tasks. The approach achieves state-of-the-art or competitive results on ten datasets, with strong localization maps and minimal per-task adjustments, demonstrating broad applicability and practicality. This work closes a gap between semantic and industrial anomaly methods and offers a versatile framework for real-world anomaly detection with interpretable outputs.

Abstract

In the domain of anomaly detection, methods often excel in either high-level semantic or low-level industrial benchmarks, rarely achieving cross-domain proficiency. Semantic anomalies are novelties that differ in meaning from the training set, like unseen objects in self-driving cars. In contrast, industrial anomalies are subtle defects that preserve semantic meaning, such as cracks in airplane components. In this paper, we present GeneralAD, an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings with minimal per-task adjustments. In our approach, we capitalize on the inherent design of Vision Transformers, which are trained on image patches, thereby ensuring that the last hidden states retain a patch-based structure. We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features to construct pseudo-abnormal samples. These features are fed to an attention-based discriminator, which is trained to score every patch in the image. With this, our method can both accurately identify anomalies at the image level and also generate interpretable anomaly maps. We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining for both localization and detection tasks.

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features

TL;DR

GeneralAD addresses the cross-domain anomaly detection gap by combining patch-based Vision Transformer features with a self-supervised anomaly feature generation module and a cross-patch attention discriminator. The SAG module creates challenging pseudo-abnormal features, enabling robust detection and interpretable localization from patch-level to image-level across semantic, near-distribution, and industrial tasks. The approach achieves state-of-the-art or competitive results on ten datasets, with strong localization maps and minimal per-task adjustments, demonstrating broad applicability and practicality. This work closes a gap between semantic and industrial anomaly methods and offers a versatile framework for real-world anomaly detection with interpretable outputs.

Abstract

In the domain of anomaly detection, methods often excel in either high-level semantic or low-level industrial benchmarks, rarely achieving cross-domain proficiency. Semantic anomalies are novelties that differ in meaning from the training set, like unseen objects in self-driving cars. In contrast, industrial anomalies are subtle defects that preserve semantic meaning, such as cracks in airplane components. In this paper, we present GeneralAD, an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings with minimal per-task adjustments. In our approach, we capitalize on the inherent design of Vision Transformers, which are trained on image patches, thereby ensuring that the last hidden states retain a patch-based structure. We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features to construct pseudo-abnormal samples. These features are fed to an attention-based discriminator, which is trained to score every patch in the image. With this, our method can both accurately identify anomalies at the image level and also generate interpretable anomaly maps. We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining for both localization and detection tasks.
Paper Structure (19 sections, 4 equations, 7 figures, 8 tables)

This paper contains 19 sections, 4 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The proposed method overview. During the training, first, an image is segmented into smaller patches and passed to the pretrained vision encoder to extract patch features. Next, the extracted patches are distorted by the self-supervised anomaly feature generation module, labeled as feature distortion. Finally, all the patches are passed to the cross-patch attention discriminator, whose role is to detect semantic, logical, or structural distortions. During the inference, the feature distortion module is deactivated, and the discriminator is used for both detection and localization tasks. The lower part of the figure displays various normal distributions.
  • Figure 2: Qualitative localization results. In the first row, normal samples on which the model is trained are shown. In the second row, we show the real anomaly samples; in the third, we show our localization maps. Our method provides interpretable anomaly segmentation maps for both industrial and semantic tasks. For example, when trained on dog images elson2007asirra, it can explain why a cat is an abnormal input. Similarly, when the normal class is cars, the entire object from a different class is localized as an anomaly.
  • Figure 3: Comparison of localization maps. We conducted a qualitative comparison between the localization maps of our method, SimpleNet, and KDAD. Qualitatively our method shows higher true positive rate and lower false positive rate, thus providing better semantic localization maps.
  • Figure 4: The effect of $K$ and noise magnitude. Independent of the type of anomaly, the best performance is found with a moderate amount of Gaussian noise. Therefore, we select $\epsilon = 0.25$ for all experiments. The optimal top $K$ parameter depends on the size of the anomalies in the dataset; thus, we choose $K=1369$ for semantic (near) anomaly detection and $K=10$ for industrial anomaly detection.
  • Figure 5: Normal and Anomaly samples from all datasets.
  • ...and 2 more figures