Table of Contents
Fetching ...

Detecting Anomalies Using Rotated Isolation Forest

Vahideh Monemizadeh, Kourosh Kiani

TL;DR

This work identifies ghost-cluster artifacts in iForest and EIF that impair anomaly detection. It proposes Rotated Isolation Forest (RIF), which uses QR-based random rotations prior to iForest construction to diversify representations and remove ghost regions. Through comprehensive experiments on synthetic and real datasets, RIF consistently outperforms both iForest and EIF in AUC and scoring stability, including high-dimensional scenarios. The approach yields simpler bookkeeping than EIF while providing superior robustness and scalability, with significant practical impact for unsupervised anomaly detection in complex data.

Abstract

The Isolation Forest (iForest), proposed by Liu, Ting, and Zhou at TKDE 2012, has become a prominent tool for unsupervised anomaly detection. However, recent research by Hariri, Kind, and Brunner, published in TKDE 2021, has revealed issues with iForest. They identified the presence of axis-aligned ghost clusters that can be misidentified as normal clusters, leading to biased anomaly scores and inaccurate predictions. In response, they developed the Extended Isolation Forest (EIF), which effectively solves these issues by eliminating the ghost clusters introduced by iForest. This enhancement results in improved consistency of anomaly scores and superior performance. We reveal a previously overlooked problem in the Extended Isolation Forest (EIF), showing that it is vulnerable to ghost inter-clusters between normal clusters of data points. In this paper, we introduce the Rotated Isolation Forest (RIF) algorithm which effectively addresses both the axis-aligned ghost clusters observed in iForest and the ghost inter-clusters seen in EIF. RIF accomplishes this by randomly rotating the dataset (using random rotation matrices and QR decomposition) before feeding it into the iForest construction, thereby increasing dataset variation and eliminating ghost clusters. Our experiments conclusively demonstrate that the RIF algorithm outperforms iForest and EIF, as evidenced by the results obtained from both synthetic datasets and real-world datasets.

Detecting Anomalies Using Rotated Isolation Forest

TL;DR

This work identifies ghost-cluster artifacts in iForest and EIF that impair anomaly detection. It proposes Rotated Isolation Forest (RIF), which uses QR-based random rotations prior to iForest construction to diversify representations and remove ghost regions. Through comprehensive experiments on synthetic and real datasets, RIF consistently outperforms both iForest and EIF in AUC and scoring stability, including high-dimensional scenarios. The approach yields simpler bookkeeping than EIF while providing superior robustness and scalability, with significant practical impact for unsupervised anomaly detection in complex data.

Abstract

The Isolation Forest (iForest), proposed by Liu, Ting, and Zhou at TKDE 2012, has become a prominent tool for unsupervised anomaly detection. However, recent research by Hariri, Kind, and Brunner, published in TKDE 2021, has revealed issues with iForest. They identified the presence of axis-aligned ghost clusters that can be misidentified as normal clusters, leading to biased anomaly scores and inaccurate predictions. In response, they developed the Extended Isolation Forest (EIF), which effectively solves these issues by eliminating the ghost clusters introduced by iForest. This enhancement results in improved consistency of anomaly scores and superior performance. We reveal a previously overlooked problem in the Extended Isolation Forest (EIF), showing that it is vulnerable to ghost inter-clusters between normal clusters of data points. In this paper, we introduce the Rotated Isolation Forest (RIF) algorithm which effectively addresses both the axis-aligned ghost clusters observed in iForest and the ghost inter-clusters seen in EIF. RIF accomplishes this by randomly rotating the dataset (using random rotation matrices and QR decomposition) before feeding it into the iForest construction, thereby increasing dataset variation and eliminating ghost clusters. Our experiments conclusively demonstrate that the RIF algorithm outperforms iForest and EIF, as evidenced by the results obtained from both synthetic datasets and real-world datasets.

Paper Structure

This paper contains 35 sections, 1 equation, 13 figures, 4 tables, 2 algorithms.

Figures (13)

  • Figure 1: Rotation improves separability.
  • Figure 2: The phenomenon of ghost clusters is observed in the output of iForest for a single Gaussian distribution.
  • Figure 3: Ghost cluster phenomenon for one Gaussian distribution.
  • Figure 4: Ghost clusters phenomenon for a two Gaussian distributions.
  • Figure 5: Two gaussian distributions with three anomaly points between two gaussian clusters. Contamination is $0.0045$ and the AUC score for iForest is $1.0$. However, the AUC score for EIF is $0.83$
  • ...and 8 more figures

Theorems & Definitions (1)

  • Definition 1: Rotation matrix