An Efficient Outlier Detection Algorithm for Data Streaming

Rui Hu; Luc; Chen; Yiwei Wang

An Efficient Outlier Detection Algorithm for Data Streaming

Rui Hu, Luc, Chen, Yiwei Wang

TL;DR

This work tackles the challenge of real-time outlier detection in data streams by improving LOF-based approaches. It introduces Efficient Incremental Local Outlier Factor (EILOF), which computes LOF scores only for newly arriving points and leaves existing scores unchanged, achieving significant computational savings while maintaining or improving detection performance in noisy streaming data. Through synthetic simulations and real datasets (Shuttle and Credit Card Fraud), EILOF demonstrates robustness to parameter choices and superior scalability as the number of incremental points grows. The proposed method offers practical impact for online fraud detection, sensor monitoring, and cybersecurity, with implementation details and code available for replication and deployment.

Abstract

The nature of modern data is increasingly real-time, making outlier detection crucial in any data-related field, such as finance for fraud detection and healthcare for monitoring patient vitals. Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data due to the need for extensive recalculations with each new data point, limiting their application in real-time environments. While the Incremental LOF (ILOF) algorithm has been developed to tackle the challenges of online anomaly detection, it remains computationally expensive when processing large streams of data points, and its detection performance may degrade after a certain threshold of points have streamed in. In this paper, we propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm only computes the LOF scores of new points without altering the LOF scores of existing data points. Although exact LOF scores have not yet been computed for the existing points in the new algorithm, datasets often contain noise, and minor deviations in LOF score calculations do not necessarily degrade detection performance. In fact, such deviations can sometimes enhance outlier detection. We systematically tested this approach on both simulated and real-world datasets, demonstrating that EILOF outperforms ILOF as the volume of streaming data increases across various scenarios. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases compared to the ILOF algorithm.

An Efficient Outlier Detection Algorithm for Data Streaming

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 15 sections, 6 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Preliminary
LOF Algorithm
ILOF Algorithm
EILOF: Efficient Incremental Local Outlier Factor
Simulation Studies
Setting of the simulated dataset
Performance of ILOF for different k and m
Performance of EILOF for different k and m
Performance Comparison in Simulation Data
Performance Comparison in Real Data
Shuttle Dataset
Credit Card Fraud Dataset
Discussion
Conclusions and Future Work

Figures (10)

Figure 1: Graphical representation showing the insertion of a new point $p_c$ and its nearest neighbors. The dashed arrows indicate that $p_c$ is not a nearest neighbor of $b$.
Figure 2: Distribution of Simulated Data Points
Figure 3: $F_{1}$ Score by Test Index for Different $k$ Values in ILOF
Figure 4: $F_{1}$ Score by $k$ for Different Sizes of Incremental Data Points ($m$) in ILOF
Figure 5: $F_{1}$ Score by Test Index for Different $k$ Values in EILOF
...and 5 more figures

An Efficient Outlier Detection Algorithm for Data Streaming

TL;DR

Abstract

An Efficient Outlier Detection Algorithm for Data Streaming

Authors

TL;DR

Abstract

Table of Contents

Figures (10)