Table of Contents
Fetching ...

Uncertainty-aware Evaluation of Auxiliary Anomalies with the Expected Anomaly Posterior

Lorenzo Perini, Maja Rudolph, Sabrina Schmedding, Chen Qiu

TL;DR

This work tackles the challenge of evaluating the quality of auxiliary synthetic anomalies used to train anomaly detectors. It introduces the Expected Anomaly Posterior (EAP), a Bayesian uncertainty-based score that combines an example's density and its distinguishability to quantify auxiliary anomaly quality. The approach derives a closed-form quality measure, φ(x), by modeling the anomaly probability with a Beta prior and updating via pseudo-observations, while estimating density with a fast rarity-based density proxy and the conditional anomaly probability with a calibrated squashing function. Theoretical analysis establishes convergence and ranking guarantees, and extensive experiments on 40 datasets demonstrate that EAP outperforms 12 adapted baselines for quality scoring, data augmentation, and model selection, including zero-shot CLIP prompt tuning.

Abstract

Anomaly detection is the task of identifying examples that do not behave as expected. Because anomalies are rare and unexpected events, collecting real anomalous examples is often challenging in several applications. In addition, learning an anomaly detector with limited (or no) anomalies often yields poor prediction performance. One option is to employ auxiliary synthetic anomalies to improve the model training. However, synthetic anomalies may be of poor quality: anomalies that are unrealistic or indistinguishable from normal samples may deteriorate the detector's performance. Unfortunately, no existing methods quantify the quality of auxiliary anomalies. We fill in this gap and propose the expected anomaly posterior (EAP), an uncertainty-based score function that measures the quality of auxiliary anomalies by quantifying the total uncertainty of an anomaly detector. Experimentally on 40 benchmark datasets of images and tabular data, we show that EAP outperforms 12 adapted data quality estimators in the majority of cases.

Uncertainty-aware Evaluation of Auxiliary Anomalies with the Expected Anomaly Posterior

TL;DR

This work tackles the challenge of evaluating the quality of auxiliary synthetic anomalies used to train anomaly detectors. It introduces the Expected Anomaly Posterior (EAP), a Bayesian uncertainty-based score that combines an example's density and its distinguishability to quantify auxiliary anomaly quality. The approach derives a closed-form quality measure, φ(x), by modeling the anomaly probability with a Beta prior and updating via pseudo-observations, while estimating density with a fast rarity-based density proxy and the conditional anomaly probability with a calibrated squashing function. Theoretical analysis establishes convergence and ranking guarantees, and extensive experiments on 40 datasets demonstrate that EAP outperforms 12 adapted baselines for quality scoring, data augmentation, and model selection, including zero-shot CLIP prompt tuning.

Abstract

Anomaly detection is the task of identifying examples that do not behave as expected. Because anomalies are rare and unexpected events, collecting real anomalous examples is often challenging in several applications. In addition, learning an anomaly detector with limited (or no) anomalies often yields poor prediction performance. One option is to employ auxiliary synthetic anomalies to improve the model training. However, synthetic anomalies may be of poor quality: anomalies that are unrealistic or indistinguishable from normal samples may deteriorate the detector's performance. Unfortunately, no existing methods quantify the quality of auxiliary anomalies. We fill in this gap and propose the expected anomaly posterior (EAP), an uncertainty-based score function that measures the quality of auxiliary anomalies by quantifying the total uncertainty of an anomaly detector. Experimentally on 40 benchmark datasets of images and tabular data, we show that EAP outperforms 12 adapted data quality estimators in the majority of cases.
Paper Structure (32 sections, 1 theorem, 16 equations, 2 figures, 4 tables)

This paper contains 32 sections, 1 theorem, 16 equations, 2 figures, 4 tables.

Key Result

Theorem 4.1

Let $x_{\textsc{r}}{}, x_{\textsc{u}}{}, x_{\textsc{i}}{} \in \mathbb{R}^d$ be, respectively, a realistic, unrealistic, and indistinguishable anomaly. If the estimators in Eq. eq:density and Eq. eq:classconditionaprob satisfy the properties of def:categorizationanomalies, then

Figures (2)

  • Figure 1: The plot illustrates the average AUC$_{\textsc{qlt}}$ obtained by each method on a per-dataset basis (left for image data, right for tabular data). EAP achieves the highest (best) performance for most datasets, beating the runner-ups Rarity and Lava on, respectively, $30$ and $31$ datasets out of $40$.
  • Figure 2: Learning curve (LC) obtained by following the method's ordering (top) and inverse ordering (bottom) for five representative image datasets. Top: EAP' LC$_{\textsc{g}}$ grows sooner (i.e., better) than the other methods', which confirms that including high-quality anomalies in the training set has a larger impact on the test performance. Bottom: EAP' LC$_{\textsc{p}}$ rises later (i.e., better) than most baselines', showing that low-quality anomalies have a comparatively modest impact on the test performance.

Theorems & Definitions (3)

  • Definition 3.1: Categorization of Anomalies
  • Theorem 4.1
  • proof