Table of Contents
Fetching ...

Enhancing Anomaly Detection Generalization through Knowledge Exposure: The Dual Effects of Augmentation

Mohammad Akhavan Anvari, Rojina Kashefi, Vahid Reza Khazaie, Mohammad Khalooei, Mohammad Sabokrou

TL;DR

The paper tackles the problem of poor out-of-distribution generalization in anomaly detection arising from semantically shifting transformations and biased benchmarks. It introduces Knowledge Exposure (KE), a CLIP-informed mechanism that uses the Wasserstein distance between CLIP representations to dynamically select semantically preserving and altering augmentations, and trains a ResNet-18 with a contrastive loss $L_{\text{contrastive}} = -\log \frac{\exp(\text{sim}(z,z^{+})/\tau)}{\sum_{z'\in\mathcal{Z}} \exp(\text{sim}(z,z')/\tau)}$ to learn robust features, followed by a one-class SVM for anomaly scoring. The approach is evaluated on CIFAR-10/100 and SVHN under Semantic-Preserving Augmentation (SPA) and Semantic-Shift Aware (SSA) protocols, showing superior AUROC over strong baselines and illustrating concept-dependent transformation semantics (e.g., flip vs. rotation). Overall, KE provides a practical pathway to more trustworthy anomaly detectors by accounting for the dual nature of augmentations, though it relies on large pre-trained models to supply knowledge about transformations.

Abstract

Anomaly detection involves identifying instances within a dataset that deviate from the norm and occur infrequently. Current benchmarks tend to favor methods biased towards low diversity in normal data, which does not align with real-world scenarios. Despite advancements in these benchmarks, contemporary anomaly detection methods often struggle with out-of-distribution generalization, particularly in classifying samples with subtle transformations during testing. These methods typically assume that normal samples during test time have distributions very similar to those in the training set, while anomalies are distributed much further away. However, real-world test samples often exhibit various levels of distribution shift while maintaining semantic consistency. Therefore, effectively generalizing to samples that have undergone semantic-preserving transformations, while accurately detecting normal samples whose semantic meaning has changed after transformation as anomalies, is crucial for the trustworthiness and reliability of a model. For example, although it is clear that rotation shifts the meaning for a car in the context of anomaly detection but preserves the meaning for a bird, current methods are likely to detect both as abnormal. This complexity underscores the necessity for dynamic learning procedures rooted in the intrinsic concept of outliers. To address this issue, we propose new testing protocols and a novel method called Knowledge Exposure (KE), which integrates external knowledge to comprehend concept dynamics and differentiate transformations that induce semantic shifts. This approach enhances generalization by utilizing insights from a pre-trained CLIP model to evaluate the significance of anomalies for each concept. Evaluation on CIFAR-10, CIFAR-100, and SVHN with the new protocols demonstrates superior performance compared to previous methods.

Enhancing Anomaly Detection Generalization through Knowledge Exposure: The Dual Effects of Augmentation

TL;DR

The paper tackles the problem of poor out-of-distribution generalization in anomaly detection arising from semantically shifting transformations and biased benchmarks. It introduces Knowledge Exposure (KE), a CLIP-informed mechanism that uses the Wasserstein distance between CLIP representations to dynamically select semantically preserving and altering augmentations, and trains a ResNet-18 with a contrastive loss to learn robust features, followed by a one-class SVM for anomaly scoring. The approach is evaluated on CIFAR-10/100 and SVHN under Semantic-Preserving Augmentation (SPA) and Semantic-Shift Aware (SSA) protocols, showing superior AUROC over strong baselines and illustrating concept-dependent transformation semantics (e.g., flip vs. rotation). Overall, KE provides a practical pathway to more trustworthy anomaly detectors by accounting for the dual nature of augmentations, though it relies on large pre-trained models to supply knowledge about transformations.

Abstract

Anomaly detection involves identifying instances within a dataset that deviate from the norm and occur infrequently. Current benchmarks tend to favor methods biased towards low diversity in normal data, which does not align with real-world scenarios. Despite advancements in these benchmarks, contemporary anomaly detection methods often struggle with out-of-distribution generalization, particularly in classifying samples with subtle transformations during testing. These methods typically assume that normal samples during test time have distributions very similar to those in the training set, while anomalies are distributed much further away. However, real-world test samples often exhibit various levels of distribution shift while maintaining semantic consistency. Therefore, effectively generalizing to samples that have undergone semantic-preserving transformations, while accurately detecting normal samples whose semantic meaning has changed after transformation as anomalies, is crucial for the trustworthiness and reliability of a model. For example, although it is clear that rotation shifts the meaning for a car in the context of anomaly detection but preserves the meaning for a bird, current methods are likely to detect both as abnormal. This complexity underscores the necessity for dynamic learning procedures rooted in the intrinsic concept of outliers. To address this issue, we propose new testing protocols and a novel method called Knowledge Exposure (KE), which integrates external knowledge to comprehend concept dynamics and differentiate transformations that induce semantic shifts. This approach enhances generalization by utilizing insights from a pre-trained CLIP model to evaluate the significance of anomalies for each concept. Evaluation on CIFAR-10, CIFAR-100, and SVHN with the new protocols demonstrates superior performance compared to previous methods.
Paper Structure (5 sections, 5 equations, 5 figures, 5 tables)

This paper contains 5 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparative study on standard and our proposed anomaly detection setups (generalization test and realistic anomaly detection) using CIFAR-10 krizhevsky2009learning, CIFAR-100 krizhevsky2009learning, and SVHN netzer2018street datasets (details of the setups in Section \ref{['exp_res']}). The left side shows images under various transformations, with red borders indicating negative/anomaly pairs and green borders indicating positive/non-anomaly pairs, selected dynamically for each class through Knowledge Exposure (KE). The right side presents AUROC percentages for different methods, emphasizing that while other methods suffer significant performance drops on more realistic setups due to overfitting and lack of generalizability, our method consistently maintains higher AUROC scores. This demonstrates the robustness and superior effectiveness of our approach in dynamic anomaly detection through KE.
  • Figure 2: The training stage of our anomaly detection framework involves three key steps. First, we apply various geometric and non-geometric transformations to images of each class. Next, we feed both the original and transformed images into the CLIP image encoder, computing the Wasserstein distance between the distribution of the transformed images and the original images from encoder output. Finally, we use the transformations with the smallest distance as positive pairs and those with the greatest distance as negative pairs in contrastive learning. This process ensures effective representation learning for anomaly detection.
  • Figure 3: An example of the Knowledge Exposure (KE) mechanism determining positive and negative transformations can be illustrated. For instance, KE identified the flip transformation as positive and the 90-degree rotation as negative for vehicles. Conversely, for fruits, the KE mechanism deemed the 90-degree rotation as a positive transformation and color jitter as a negative one. For sunflowers, a 90-degree rotation is considered positive, while glass blur is considered negative. This selection makes sense because a flip does not change the fundamental appearance of vehicles, whereas a 90-degree rotation does. Similarly, for fruits, color jitter significantly alters their natural perception, while rotation does not. Also, A 90-degree rotation is a fitting positive for sunflowers due to their usual incline to one side, while glass blur which distort their petals, is a negative. Additional images of CIFAR-100 classes can be found in the Appendix \ref{['sec:app']}.
  • Figure 4: This figure illustrates the distribution of distances under different transformations: the blue histogram represents the original (normal) images, the green histogram represents flipped images, and the red histogram represents 90-degree rotated images of all instances from the "car" class in the CIFAR-10 dataset. The significant overlap between the rotated and normal data indicates that the model has a strong relation to rotation.
  • Figure 5: Visualization of dynamically chosen negative and positive pairs for each class through Knowledge Exposure (KE) from randomly selected classes in CIFAR-100.