Table of Contents
Fetching ...

Robust Isolation Forest using Soft Sparse Random Projection and Valley Emphasis Method

Hun Kang, Kyoungok Kim

TL;DR

RiForest addresses the inconsistent performance of prior iForest variants by jointly leveraging original features and soft sparse random projections to form a diverse hyperplane set, and by using the valley emphasis method to determine split points. The method introduces dimension entropy-based feature selection and a variable path-length scheme to sharpen anomaly scores, achieving superior stability and robustness to noisy variables across 24 benchmark datasets. Across extensive experiments, RiForest demonstrates strong AUROC performance and lower variability compared with baselines, with ablation analysis confirming the value of its components, especially the valley-based split. This approach offers a practical, dataset-agnostic improvement for unsupervised anomaly detection in diverse domains, reducing sensitivity to noise and distributional differences.

Abstract

Isolation Forest (iForest) is an unsupervised anomaly detection algorithm designed to effectively detect anomalies under the assumption that anomalies are ``few and different." Various studies have aimed to enhance iForest, but the resulting algorithms often exhibited significant performance disparities across datasets. Additionally, the challenge of isolating rare and widely distributed anomalies persisted in research focused on improving splits. To address these challenges, we introduce Robust iForest (RiForest). RiForest leverages both existing features and random hyperplanes obtained through soft sparse random projection to identify superior split features for anomaly detection, independent of datasets. It utilizes the underutilized valley emphasis method for optimal split point determination and incorporates sparsity randomization in soft sparse random projection for enhanced anomaly detection robustness. Across 24 benchmark datasets, experiments demonstrate RiForest's consistent outperformance of existing algorithms in anomaly detection, emphasizing stability and robustness to noise variables.

Robust Isolation Forest using Soft Sparse Random Projection and Valley Emphasis Method

TL;DR

RiForest addresses the inconsistent performance of prior iForest variants by jointly leveraging original features and soft sparse random projections to form a diverse hyperplane set, and by using the valley emphasis method to determine split points. The method introduces dimension entropy-based feature selection and a variable path-length scheme to sharpen anomaly scores, achieving superior stability and robustness to noisy variables across 24 benchmark datasets. Across extensive experiments, RiForest demonstrates strong AUROC performance and lower variability compared with baselines, with ablation analysis confirming the value of its components, especially the valley-based split. This approach offers a practical, dataset-agnostic improvement for unsupervised anomaly detection in diverse domains, reducing sensitivity to noise and distributional differences.

Abstract

Isolation Forest (iForest) is an unsupervised anomaly detection algorithm designed to effectively detect anomalies under the assumption that anomalies are ``few and different." Various studies have aimed to enhance iForest, but the resulting algorithms often exhibited significant performance disparities across datasets. Additionally, the challenge of isolating rare and widely distributed anomalies persisted in research focused on improving splits. To address these challenges, we introduce Robust iForest (RiForest). RiForest leverages both existing features and random hyperplanes obtained through soft sparse random projection to identify superior split features for anomaly detection, independent of datasets. It utilizes the underutilized valley emphasis method for optimal split point determination and incorporates sparsity randomization in soft sparse random projection for enhanced anomaly detection robustness. Across 24 benchmark datasets, experiments demonstrate RiForest's consistent outperformance of existing algorithms in anomaly detection, emphasizing stability and robustness to noise variables.

Paper Structure

This paper contains 17 sections, 9 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of anomaly separability between original features and random hyperplanes
  • Figure 2: Comparison between the methods to determine a split point
  • Figure 3: Results of the robustness experiments
  • Figure 4: Nemenyi test figures on AUROC and CV
  • Figure 5: Hyperparameter analysis