Table of Contents
Fetching ...

Change Detection in Multivariate data streams: Online Analysis with Kernel-QuantTree

Michelangelo Olmo Nogara Notarianni, Filippo Leveni, Diego Stucchi, Luca Frittoli, Giacomo Boracchi

TL;DR

This work addresses online change detection in multivariate data streams under non-parametric assumptions and targeted false-alarm control. It introduces KQT-EWMA, which combines a Kernel-QuantTree histogram of the baseline distribution with an EWMA-based statistic to monitor the stream online, using Dirichlet-based Monte Carlo procedures to set thresholds that guarantee $ARL_0$ irrespective of the data distribution. The key contributions are a fully non-parametric online detector with pre-specified $ARL_0$, a detailed complexity analysis, and extensive experiments showing state-of-the-art detection delays, especially for complex distributions, while maintaining false-alarm control. The results have practical impact for real-time monitoring in industrial, network, and security applications where rapid detection and reliable false-alarm rates are essential.

Abstract

We present Kernel-QuantTree Exponentially Weighted Moving Average (KQT-EWMA), a non-parametric change-detection algorithm that combines the Kernel-QuantTree (KQT) histogram and the EWMA statistic to monitor multivariate data streams online. The resulting monitoring scheme is very flexible, since histograms can be used to model any stationary distribution, and practical, since the distribution of test statistics does not depend on the distribution of datastream in stationary conditions (non-parametric monitoring). KQT-EWMA enables controlling false alarms by operating at a pre-determined Average Run Length ($ARL_0$), which measures the expected number of stationary samples to be monitored before triggering a false alarm. The latter peculiarity is in contrast with most non-parametric change-detection tests, which rarely can control the $ARL_0$ a priori. Our experiments on synthetic and real-world datasets demonstrate that KQT-EWMA can control $ARL_0$ while achieving detection delays comparable to or lower than state-of-the-art methods designed to work in the same conditions.

Change Detection in Multivariate data streams: Online Analysis with Kernel-QuantTree

TL;DR

This work addresses online change detection in multivariate data streams under non-parametric assumptions and targeted false-alarm control. It introduces KQT-EWMA, which combines a Kernel-QuantTree histogram of the baseline distribution with an EWMA-based statistic to monitor the stream online, using Dirichlet-based Monte Carlo procedures to set thresholds that guarantee irrespective of the data distribution. The key contributions are a fully non-parametric online detector with pre-specified , a detailed complexity analysis, and extensive experiments showing state-of-the-art detection delays, especially for complex distributions, while maintaining false-alarm control. The results have practical impact for real-time monitoring in industrial, network, and security applications where rapid detection and reliable false-alarm rates are essential.

Abstract

We present Kernel-QuantTree Exponentially Weighted Moving Average (KQT-EWMA), a non-parametric change-detection algorithm that combines the Kernel-QuantTree (KQT) histogram and the EWMA statistic to monitor multivariate data streams online. The resulting monitoring scheme is very flexible, since histograms can be used to model any stationary distribution, and practical, since the distribution of test statistics does not depend on the distribution of datastream in stationary conditions (non-parametric monitoring). KQT-EWMA enables controlling false alarms by operating at a pre-determined Average Run Length (), which measures the expected number of stationary samples to be monitored before triggering a false alarm. The latter peculiarity is in contrast with most non-parametric change-detection tests, which rarely can control the a priori. Our experiments on synthetic and real-world datasets demonstrate that KQT-EWMA can control while achieving detection delays comparable to or lower than state-of-the-art methods designed to work in the same conditions.

Paper Structure

This paper contains 13 sections, 9 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: KQT-EWMA
  • Figure 2: Empirical $ARL_0$ and detection delay achieved by the considered methods monitoring data streams generated by Gaussian mixtures with increasing number of components ($1$, $2$, $3$). We show that as the number of components increases, KQT-EWMA with Weighted Mahalanobis (WM) distance advantage in terms of detection delay increases, achieving in general the lowest delays while controlling false alarms. In all the experiments, the GMM used to compute the WM distance fits $M=4$ components.
  • Figure 3: Average empirical $ARL_0$ and detection delay on data streams sampled from the UCI datasets, excluding the highest-dimensional ones (i.e., "particle" and "sensorless"). In these two cases, $N=4096$ training samples are not enough for KQT-EWMA based on Mahalanobis and WM distances to properly control $ARL_0$. In this setting, KQT-EWMA with WM distance achieves by far the best performance, halving the detection delay of QT-EWMA while controlling the target $ARL_0$.
  • Figure 4: Average empirical $ARL_0$ and detection delay on data streams from the INSECTS dataset insects, with different combinations of $\phi_0$ and post-change distribution $\phi_1$. QT-EWMA and KQT-EWMA achieve similar detection delays, while KQT-EWMA with WM distance struggles in controlling higher values of $ARL_0$.
  • Figure 5: Empirical $ARL_0$ and detection delay on Gaussian data streams in $d=4$ dimensions, for varying training set sizes $N\in\{128,256,1024,4096\}$. The empirical $ARL_0$ (first row) of QT-EWMA and SPLL-CPM always approaches the target values ($500, 1000, 2000, 5000$), while the other methods cannot control the $ARL_0$. When the training set size $N$ is sufficiently large ($N\in\{1024, 4096\}$), KQT-EWMA can control the FA rate, and achieves the lowest detection delay when using the Mahalanobis or the WM distance.
  • ...and 3 more figures