Anomaly Detection Based on Isolation Mechanisms: A Survey
Yang Cao, Haolong Xiang, Hang Zhang, Ye Zhu, Kai Ming Ting
TL;DR
This survey addresses anomaly detection in the big-data era by focusing on isolation-based unsupervised methods. It systematically catalogs data partitioning strategies (axis-aligned, random hyperplanes, hyperspheres, Voronoi, and hash-based), how anomalies are scored through geometric and similarity-based perspectives, and practical extensions to streaming, trajectory, time-series, and unstructured data. Key contributions include a comprehensive taxonomy of methods (e.g., iForest, SCiForest, iNNE, LSHiForest, IDK, DIF), demonstrations of performance across datasets, and concrete guidance on parameter and model optimization with open challenges such as theoretical analysis and incremental learning. The work underscores the practical impact of scalable, low-memory anomaly detection across diverse domains while outlining avenues for future research to strengthen theory and adaptability.
Abstract
Anomaly detection is a longstanding and active research area that has many applications in domains such as finance, security, and manufacturing. However, the efficiency and performance of anomaly detection algorithms are challenged by the large-scale, high-dimensional, and heterogeneous data that are prevalent in the era of big data. Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data. It relies on the idea that anomalies are few and different from normal instances, and thus can be easily isolated by random partitioning. Isolation-based methods have several advantages over existing methods, such as low computational complexity, low memory usage, high scalability, robustness to noise and irrelevant features, and no need for prior knowledge or heavy parameter tuning. In this survey, we review the state-of-the-art isolation-based anomaly detection methods, including their data partitioning strategies, anomaly score functions, and algorithmic details. We also discuss some extensions and applications of isolation-based methods in different scenarios, such as detecting anomalies in streaming data, time series, trajectory, and image datasets. Finally, we identify some open challenges and future directions for isolation-based anomaly detection research.
