Table of Contents
Fetching ...

IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

Yang Xu, Yixiao Ma, Kaifeng Zhang, Zuliang Yang, Kai Ming Ting

TL;DR

This work tackles streaming anomaly detection under concept drift by introducing IDK-S, an Incremental Distributional Kernel that adapts the powerful IDK via a lightweight, data-driven partition update. By treating IDK as an ensemble of weak hypersphere detectors and replacing only those linked to obsolete data, IDK-S achieves statistical equivalence to full retraining while dramatically reducing time and memory costs. The framework initializes on the first window and continuously updates as new data arrive, maintaining a current distributional representation and efficient scores for incoming instances. Across 13 benchmarks, IDK-S delivers state-of-the-art detection accuracy with substantial speedups, enabling real-time anomaly detection in high-volume streaming environments.

Abstract

Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce $\mathcal{IDK}$-$\mathcal{S}$, a novel $\mathbf{I}$ncremental $\mathbf{D}$istributional $\mathbf{K}$ernel for $\mathbf{S}$treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of $\mathcal{IDK}$-$\mathcal{S}$ is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining. This is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrained model. Our extensive experiments on thirteen benchmarks demonstrate that $\mathcal{IDK}$-$\mathcal{S}$ achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.

IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

TL;DR

This work tackles streaming anomaly detection under concept drift by introducing IDK-S, an Incremental Distributional Kernel that adapts the powerful IDK via a lightweight, data-driven partition update. By treating IDK as an ensemble of weak hypersphere detectors and replacing only those linked to obsolete data, IDK-S achieves statistical equivalence to full retraining while dramatically reducing time and memory costs. The framework initializes on the first window and continuously updates as new data arrive, maintaining a current distributional representation and efficient scores for incoming instances. Across 13 benchmarks, IDK-S delivers state-of-the-art detection accuracy with substantial speedups, enabling real-time anomaly detection in high-volume streaming environments.

Abstract

Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce -, a novel ncremental istributional ernel for treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of - is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining. This is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrained model. Our extensive experiments on thirteen benchmarks demonstrate that - achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.

Paper Structure

This paper contains 22 sections, 1 theorem, 6 equations, 5 figures, 4 tables.

Key Result

PROPOSITION 1

Given a data window $\mathbf{X}_i$ of size $\omega$ at any time step $i\in\mathbb{N}$. Let $\mathbf{S}_i \subset \mathbf{X}_i$ be the set of $\psi$ unique sample points used to generate the model partitions $\Phi_i$. The probability of obtaining a specific set $\mathbf{S}_i$ using the full retrainin where $\binom{\omega}{\psi}$ is the binomial coefficient, representing the total number of ways to

Figures (5)

  • Figure 1: An illustration of $\mathcal{IDK}$-$\mathcal{S}$'s adaptivity to concept drift. (a) A data stream where the normal distribution (blue clusters) shifts over four time steps. (b) The corresponding normal score distribution estimated by $\mathcal{IDK}$-$\mathcal{S}$. The heatmap (brighter areas indicate higher normal scores, i.e., higher normality) dynamically follows the evolving normality, demonstrating the model's ability to maintain accurate detection in a non-stationary online environment.
  • Figure 2: An illustration of the feature map $\Phi$ of IK in a Hilbert space $\mathscr{H}$ with one partitioning ($t=1$) of three hyperspheres ($\psi=3$), each centred at a blue dot that is randomly selected from the given dataset $D$. When a point $x\in D$ falls into an overlapping region, it is regarded to be in the hypersphere whose centre is closer to $x$.
  • Figure 3: An illustration of the incremental update mechanism of $\mathcal{IDK}$-$\mathcal{S}$. As the data slides from $\mathbf{X}_i$ to $\mathbf{X}_{i+1}$, the model updates by discarding the partition tied to an obsolete sample ($s_1$, yellow) and generating a new partition from an incoming sample ($s_4$, blue). This process selectively modifies the set of hypersphere partitions, updating the model from $\Theta_i$ to $\Theta_{i+1}$.
  • Figure 4: The performance of methods on non-stationary datasets where $\mathcal{P}_{N}$ and $\mathcal{P}_{A}$ change over time.
  • Figure 5: Distribution of the optimal sample size $\psi$.

Theorems & Definitions (1)

  • PROPOSITION 1: Sampling Distribution Equivalence