Table of Contents
Fetching ...

Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection

Jordan F. Masakuna, DJeff Kanda Nkashama, Arian Soltani, Marc Frappier, Pierre-Martin Tardif, Froduald Kabanza

TL;DR

The paper addresses how inaccuracies in contamination ratio affect robust unsupervised anomaly detection. It experiments with shallow models, including IF, LOF, and OCSVM, across six benchmark datasets to test resilience to misinformed contamination ratios. Contrary to expectations, results show these models remain robust and can even benefit from inaccurate contamination information. The findings have implications for deploying anomaly detection systems in real-world noisy data and highlight the need for further analysis of the mechanisms behind this robustness.

Abstract

Training data sets intended for unsupervised anomaly detection, typically presumed to be anomaly-free, often contain anomalies (or contamination), a challenge that significantly undermines model performance. Most robust unsupervised anomaly detection models rely on contamination ratio information to tackle contamination. However, in reality, contamination ratio may be inaccurate. We investigate on the impact of inaccurate contamination ratio information in robust unsupervised anomaly detection. We verify whether they are resilient to misinformed contamination ratios. Our investigation on 6 benchmark data sets reveals that such models are not adversely affected by exposure to misinformation. In fact, they can exhibit improved performance when provided with such inaccurate contamination ratios.

Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection

TL;DR

The paper addresses how inaccuracies in contamination ratio affect robust unsupervised anomaly detection. It experiments with shallow models, including IF, LOF, and OCSVM, across six benchmark datasets to test resilience to misinformed contamination ratios. Contrary to expectations, results show these models remain robust and can even benefit from inaccurate contamination information. The findings have implications for deploying anomaly detection systems in real-world noisy data and highlight the need for further analysis of the mechanisms behind this robustness.

Abstract

Training data sets intended for unsupervised anomaly detection, typically presumed to be anomaly-free, often contain anomalies (or contamination), a challenge that significantly undermines model performance. Most robust unsupervised anomaly detection models rely on contamination ratio information to tackle contamination. However, in reality, contamination ratio may be inaccurate. We investigate on the impact of inaccurate contamination ratio information in robust unsupervised anomaly detection. We verify whether they are resilient to misinformed contamination ratios. Our investigation on 6 benchmark data sets reveals that such models are not adversely affected by exposure to misinformation. In fact, they can exhibit improved performance when provided with such inaccurate contamination ratios.
Paper Structure (2 sections, 3 figures)

This paper contains 2 sections, 3 figures.

Figures (3)

  • Figure 1: Expected behavior. Red dashed line is true contamination ratio.
  • Figure 2: Visualization of training data sets.
  • Figure 3: Model's accuracy against misinformed contamination ratio where x-axis represent misinformed contamination ratios. In red dashed lines are true contamination ratios.