Table of Contents
Fetching ...

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

Minjae Ok, Simon Klüttermann, Emmanuel Müller

TL;DR

It is demonstrated that while the F1 score and AUCPR are sensitive to outlier fractions, the ROC AUC maintains consistency and is unaffected by such variability, and under conditions of a fixed outlier fraction in the test set, the choice between these two metrics may be less critical.

Abstract

Anomaly detection is a dynamic field, in which the evaluation of models plays a critical role in understanding their effectiveness. The selection and interpretation of the evaluation metrics are pivotal, particularly in scenarios with varying amounts of anomalies. This study focuses on examining the behaviors of three widely used anomaly detection metrics under different conditions: F1 score, Receiver Operating Characteristic Area Under Curve (ROC AUC), and Precision-Recall Curve Area Under Curve (AUCPR). Our study critically analyzes the extent to which these metrics provide reliable and distinct insights into model performance, especially considering varying levels of outlier fractions and contamination thresholds in datasets. Through a comprehensive experimental setup involving widely recognized algorithms for anomaly detection, we present findings that challenge the conventional understanding of these metrics and reveal nuanced behaviors under varying conditions. We demonstrated that while the F1 score and AUCPR are sensitive to outlier fractions, the ROC AUC maintains consistency and is unaffected by such variability. Additionally, under conditions of a fixed outlier fraction in the test set, we observe an alignment between ROC AUC and AUCPR, indicating that the choice between these two metrics may be less critical in such scenarios. The results of our study contribute to a more refined understanding of metric selection and interpretation in anomaly detection, offering valuable insights for both researchers and practitioners in the field.

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

TL;DR

It is demonstrated that while the F1 score and AUCPR are sensitive to outlier fractions, the ROC AUC maintains consistency and is unaffected by such variability, and under conditions of a fixed outlier fraction in the test set, the choice between these two metrics may be less critical.

Abstract

Anomaly detection is a dynamic field, in which the evaluation of models plays a critical role in understanding their effectiveness. The selection and interpretation of the evaluation metrics are pivotal, particularly in scenarios with varying amounts of anomalies. This study focuses on examining the behaviors of three widely used anomaly detection metrics under different conditions: F1 score, Receiver Operating Characteristic Area Under Curve (ROC AUC), and Precision-Recall Curve Area Under Curve (AUCPR). Our study critically analyzes the extent to which these metrics provide reliable and distinct insights into model performance, especially considering varying levels of outlier fractions and contamination thresholds in datasets. Through a comprehensive experimental setup involving widely recognized algorithms for anomaly detection, we present findings that challenge the conventional understanding of these metrics and reveal nuanced behaviors under varying conditions. We demonstrated that while the F1 score and AUCPR are sensitive to outlier fractions, the ROC AUC maintains consistency and is unaffected by such variability. Additionally, under conditions of a fixed outlier fraction in the test set, we observe an alignment between ROC AUC and AUCPR, indicating that the choice between these two metrics may be less critical in such scenarios. The results of our study contribute to a more refined understanding of metric selection and interpretation in anomaly detection, offering valuable insights for both researchers and practitioners in the field.
Paper Structure (22 sections, 4 equations, 5 figures, 1 table)

This paper contains 22 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Correlation of F1 Score and AUCPR at fixed $50\%$ Outlier Fraction: This figure displays the strong Spearman correlation between the F1 score and AUCPR across contamination levels of $1\%$, $5\%$, and $10\%$. The correlation demonstrates the F1 score's increasing stability and decreasing variability with higher contamination levels, affirming a robust relationship between these metrics when the outlier fraction is stable.
  • Figure 2: Alignment of AUCPR and ROC AUC at a stable outlier fraction of $50\%$: The figure illustrates the near-perfect alignment between AUCPR and ROC AUC with a $97\%$ Spearman correlation coefficient. This high correlation indicates that AUCPR is virtually equivalent to ROC AUC, with minimal deviation, under conditions of constant outlier fraction.
  • Figure 3: Low Correlation of AUCPR and ROC AUC in Variable Outlier Fractions: This figure illustrates the significant reduction in the correlation between AUCPR and ROC AUC when outlier fractions vary. It underscores the sensitivity of the AUCPR metric to changes in outlier distribution, emphasizing the challenges in performance evaluation for anomaly detection in environments with non-stable outlier conditions.
  • Figure 4: Comparative Analysis of F1 Score, ROC AUC, and AUCPR under Random and Fixed Fraction Conditions: This figure illustrates the variability and correlation of the F1 Score, ROC AUC, and AUCPR under both random and fixed outlier fractions. Notably, ROC AUC exhibits higher consistency and robustness across varying conditions, maintaining stronger correlations compared to F1 Score and AUCPR, which show significant variability, particularly under random fractions.
  • Figure 5: Comparative Performance of Evaluation Metrics with Increasing Mean Separation: Depicted here is how the ROC AUC and AUCPR metrics tend to converge as the mean separation between normal and anomalous distributions increases.