Table of Contents
Fetching ...

Anomaly Detection Based on Isolation Mechanisms: A Survey

Yang Cao, Haolong Xiang, Hang Zhang, Ye Zhu, Kai Ming Ting

TL;DR

This survey addresses anomaly detection in the big-data era by focusing on isolation-based unsupervised methods. It systematically catalogs data partitioning strategies (axis-aligned, random hyperplanes, hyperspheres, Voronoi, and hash-based), how anomalies are scored through geometric and similarity-based perspectives, and practical extensions to streaming, trajectory, time-series, and unstructured data. Key contributions include a comprehensive taxonomy of methods (e.g., iForest, SCiForest, iNNE, LSHiForest, IDK, DIF), demonstrations of performance across datasets, and concrete guidance on parameter and model optimization with open challenges such as theoretical analysis and incremental learning. The work underscores the practical impact of scalable, low-memory anomaly detection across diverse domains while outlining avenues for future research to strengthen theory and adaptability.

Abstract

Anomaly detection is a longstanding and active research area that has many applications in domains such as finance, security, and manufacturing. However, the efficiency and performance of anomaly detection algorithms are challenged by the large-scale, high-dimensional, and heterogeneous data that are prevalent in the era of big data. Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data. It relies on the idea that anomalies are few and different from normal instances, and thus can be easily isolated by random partitioning. Isolation-based methods have several advantages over existing methods, such as low computational complexity, low memory usage, high scalability, robustness to noise and irrelevant features, and no need for prior knowledge or heavy parameter tuning. In this survey, we review the state-of-the-art isolation-based anomaly detection methods, including their data partitioning strategies, anomaly score functions, and algorithmic details. We also discuss some extensions and applications of isolation-based methods in different scenarios, such as detecting anomalies in streaming data, time series, trajectory, and image datasets. Finally, we identify some open challenges and future directions for isolation-based anomaly detection research.

Anomaly Detection Based on Isolation Mechanisms: A Survey

TL;DR

This survey addresses anomaly detection in the big-data era by focusing on isolation-based unsupervised methods. It systematically catalogs data partitioning strategies (axis-aligned, random hyperplanes, hyperspheres, Voronoi, and hash-based), how anomalies are scored through geometric and similarity-based perspectives, and practical extensions to streaming, trajectory, time-series, and unstructured data. Key contributions include a comprehensive taxonomy of methods (e.g., iForest, SCiForest, iNNE, LSHiForest, IDK, DIF), demonstrations of performance across datasets, and concrete guidance on parameter and model optimization with open challenges such as theoretical analysis and incremental learning. The work underscores the practical impact of scalable, low-memory anomaly detection across diverse domains while outlining avenues for future research to strengthen theory and adaptability.

Abstract

Anomaly detection is a longstanding and active research area that has many applications in domains such as finance, security, and manufacturing. However, the efficiency and performance of anomaly detection algorithms are challenged by the large-scale, high-dimensional, and heterogeneous data that are prevalent in the era of big data. Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data. It relies on the idea that anomalies are few and different from normal instances, and thus can be easily isolated by random partitioning. Isolation-based methods have several advantages over existing methods, such as low computational complexity, low memory usage, high scalability, robustness to noise and irrelevant features, and no need for prior knowledge or heavy parameter tuning. In this survey, we review the state-of-the-art isolation-based anomaly detection methods, including their data partitioning strategies, anomaly score functions, and algorithmic details. We also discuss some extensions and applications of isolation-based methods in different scenarios, such as detecting anomalies in streaming data, time series, trajectory, and image datasets. Finally, we identify some open challenges and future directions for isolation-based anomaly detection research.
Paper Structure (33 sections, 13 equations, 9 figures, 3 tables)

This paper contains 33 sections, 13 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Demonstration of different types of anomalies. (a) A point anomaly is a single abnormal red point; a group anomaly is an abnormal red cluster. (b) A contextual group anomaly is a series of related but individually normal data points that are collectively unusual. (c) A breakage on the texture of a nut (d) Four iris images, three are Versicolour and one is Setosa marked in red, they have similar texture in low-level pixel but with different shapes misc_iris_53.
  • Figure 2: Illustration of LSHiForest.
  • Figure 3: Illustration of isolation forest.
  • Figure 4: Illustration of iNNE with $\psi$ = 5. If $x$ is in the overlap region of multiple partitions, it will be assigned to the hypersphere generated by $x$'s nearest subsample point. Since $y$ is out of any hypersphere, the anomaly score will be 1.
  • Figure 5: Illustration of the general procedure of iForestASD
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5