Table of Contents
Fetching ...

Dependency-based Anomaly Detection: a General Framework and Comprehensive Evaluation

Sha Lu, Lin Liu, Kui Yu, Thuc Duy Le, Jixue Liu, Jiuyong Li

TL;DR

DepAD introduces a general, dependency-based anomaly-detection framework that reframes unsupervised detection as supervised variable selection and prediction across three phases. By employing off-the-shelf techniques for relevant-variable selection, per-variable prediction, and robust anomaly scoring, it enables domain-tailored detectors with interpretable explanations of detected anomalies. Empirical results across 32 real-world datasets show two instantiations, FBED-CART-PS and FBED-CART-Sum, outperforming nine state-of-the-art baselines in most settings and demonstrating strong performance in high-dimensional data, along with substantive interpretability through dependency deviations. The framework provides practical guidance for technique selection, emphasizes interpretability, and suggests avenues for ensemble extensions to further improve robustness and coverage of dependency-based anomalies.

Abstract

Anomaly detection is crucial for understanding unusual behaviors in data, as anomalies offer valuable insights. This paper introduces Dependency-based Anomaly Detection (DepAD), a general framework that utilizes variable dependencies to uncover meaningful anomalies with better interpretability. DepAD reframes unsupervised anomaly detection as supervised feature selection and prediction tasks, which allows users to tailor anomaly detection algorithms to their specific problems and data. We extensively evaluate representative off-the-shelf techniques for the DepAD framework. Two DepAD algorithms emerge as all-rounders and superior performers in handling a wide range of datasets compared to nine state-of-the-art anomaly detection methods. Additionally, we demonstrate that DepAD algorithms provide new and insightful interpretations for detected anomalies.

Dependency-based Anomaly Detection: a General Framework and Comprehensive Evaluation

TL;DR

DepAD introduces a general, dependency-based anomaly-detection framework that reframes unsupervised detection as supervised variable selection and prediction across three phases. By employing off-the-shelf techniques for relevant-variable selection, per-variable prediction, and robust anomaly scoring, it enables domain-tailored detectors with interpretable explanations of detected anomalies. Empirical results across 32 real-world datasets show two instantiations, FBED-CART-PS and FBED-CART-Sum, outperforming nine state-of-the-art baselines in most settings and demonstrating strong performance in high-dimensional data, along with substantive interpretability through dependency deviations. The framework provides practical guidance for technique selection, emphasizes interpretability, and suggests avenues for ensemble extensions to further improve robustness and coverage of dependency-based anomalies.

Abstract

Anomaly detection is crucial for understanding unusual behaviors in data, as anomalies offer valuable insights. This paper introduces Dependency-based Anomaly Detection (DepAD), a general framework that utilizes variable dependencies to uncover meaningful anomalies with better interpretability. DepAD reframes unsupervised anomaly detection as supervised feature selection and prediction tasks, which allows users to tailor anomaly detection algorithms to their specific problems and data. We extensively evaluate representative off-the-shelf techniques for the DepAD framework. Two DepAD algorithms emerge as all-rounders and superior performers in handling a wide range of datasets compared to nine state-of-the-art anomaly detection methods. Additionally, we demonstrate that DepAD algorithms provide new and insightful interpretations for detected anomalies.

Paper Structure

This paper contains 26 sections, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: An example illustrating obesity detection in a dataset with height and weight variables. A proximity-based approach mislabels $a_2$ as obese, while a dependency-based approach correctly identifies $a_1$ as obese and $a_2$ as normal.
  • Figure 2: The DepAD framework
  • Figure 3: Performance of relevant variable selection techniques. For each technique, 25 results averaged from 32 datasets are shown in black dots. The violin plot's outline indicates the Gaussian kernel density estimate of these results, with a red dot for the mean and lines indicating standard deviation. Techniques are pairwise compared using the Wilcoxon rank-sum test, with the left presumed superior; $p$-values are displayed above each pair.
  • Figure 4: Performance of different prediction models. For each technique, 25 results averaged from 32 datasets are shown in black dots. The violin plot's outline indicates the Gaussian kernel density estimate of these results, with a red dot for the mean and lines indicating standard deviation. Techniques are pairwise compared using the Wilcoxon rank-sum test, with the left presumed superior; $p$-values are displayed above each pair.
  • Figure 5: Performance of anomaly score generation techniques. For each technique, 25 results averaged from 32 datasets are shown in black dots. The violin plot's outline indicates the Gaussian kernel density estimated from these results, with a red dot for the mean and lines indicating standard deviation. Techniques are pairwise compared using the Wilcoxon rank-sum test, with the left presumed superior; $p$-values are displayed above each pair.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Example 1