Table of Contents
Fetching ...

Benchmarking Anomaly Detection Algorithms: Deep Learning and Beyond

Shanay Mehta, Shlok Mehendale, Nicole Fernandes, Jyotirmoy Sarkar, Santonu Sarkar, Snehanshu Saha

TL;DR

This study tackles the problem of detecting anomalies in complex, mission-critical systems under severe class imbalance. It conducts a comprehensive benchmark across 104 public datasets, spanning 73 multivariate and 31 univariate cases, comparing a wide range of methods from classical ML to deep learning and outlier-detection architectures. Key findings show that while deep learning dominates in multivariate settings, unsupervised tree-based evolutionary algorithms like MGBTai and DBTAI often outperform DL in univariate, small-sample scenarios, and that recent DL methods require anomaly contamination and substantial computational resources. The results provide practical guidance for practitioners on method selection across data regimes and motivate future work in hybrid, scalable, and explainable anomaly-detection approaches for real-world deployments.

Abstract

Detection of anomalous situations for complex mission-critical systems hold paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of Machine Learning (ML)-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical ML, including various tree-based approaches to Deep Learning (DL) and outlier detection methods. The inclusion of 104 publicly available enhances the diversity of the study, allowing a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper evaluates the general notion of DL as a universal solution, showing that, while powerful, it is not always the best fit for every scenario. The findings reveal that recently proposed tree-based evolutionary algorithms match DL methods and sometimes outperform them in many instances of univariate data where the size of the data is small and number of anomalies are less than 10%. Specifically, tree-based approaches successfully detect singleton anomalies in datasets where DL falls short. To the best of the authors' knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier.

Benchmarking Anomaly Detection Algorithms: Deep Learning and Beyond

TL;DR

This study tackles the problem of detecting anomalies in complex, mission-critical systems under severe class imbalance. It conducts a comprehensive benchmark across 104 public datasets, spanning 73 multivariate and 31 univariate cases, comparing a wide range of methods from classical ML to deep learning and outlier-detection architectures. Key findings show that while deep learning dominates in multivariate settings, unsupervised tree-based evolutionary algorithms like MGBTai and DBTAI often outperform DL in univariate, small-sample scenarios, and that recent DL methods require anomaly contamination and substantial computational resources. The results provide practical guidance for practitioners on method selection across data regimes and motivate future work in hybrid, scalable, and explainable anomaly-detection approaches for real-world deployments.

Abstract

Detection of anomalous situations for complex mission-critical systems hold paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of Machine Learning (ML)-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical ML, including various tree-based approaches to Deep Learning (DL) and outlier detection methods. The inclusion of 104 publicly available enhances the diversity of the study, allowing a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper evaluates the general notion of DL as a universal solution, showing that, while powerful, it is not always the best fit for every scenario. The findings reveal that recently proposed tree-based evolutionary algorithms match DL methods and sometimes outperform them in many instances of univariate data where the size of the data is small and number of anomalies are less than 10%. Specifically, tree-based approaches successfully detect singleton anomalies in datasets where DL falls short. To the best of the authors' knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier.
Paper Structure (20 sections, 4 figures, 5 tables)

This paper contains 20 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison of anomaly detection algorithms. The algorithm obtaining the highest recall has been given credit. Note, DL algorithms in the figure consist of 11 recent SOTA methods.
  • Figure 2: Visualization of the elbow point
  • Figure 3: Comparison of algorithms using various metrics on multivariate datasets
  • Figure 4: Comparison of algorithms using various metrics on univariate datasets