IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection
Yang Xu, Yixiao Ma, Kaifeng Zhang, Zuliang Yang, Kai Ming Ting
TL;DR
This work tackles streaming anomaly detection under concept drift by introducing IDK-S, an Incremental Distributional Kernel that adapts the powerful IDK via a lightweight, data-driven partition update. By treating IDK as an ensemble of weak hypersphere detectors and replacing only those linked to obsolete data, IDK-S achieves statistical equivalence to full retraining while dramatically reducing time and memory costs. The framework initializes on the first window and continuously updates as new data arrive, maintaining a current distributional representation and efficient scores for incoming instances. Across 13 benchmarks, IDK-S delivers state-of-the-art detection accuracy with substantial speedups, enabling real-time anomaly detection in high-volume streaming environments.
Abstract
Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce $\mathcal{IDK}$-$\mathcal{S}$, a novel $\mathbf{I}$ncremental $\mathbf{D}$istributional $\mathbf{K}$ernel for $\mathbf{S}$treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of $\mathcal{IDK}$-$\mathcal{S}$ is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining. This is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrained model. Our extensive experiments on thirteen benchmarks demonstrate that $\mathcal{IDK}$-$\mathcal{S}$ achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.
