Change Detection in Multivariate data streams: Online Analysis with Kernel-QuantTree
Michelangelo Olmo Nogara Notarianni, Filippo Leveni, Diego Stucchi, Luca Frittoli, Giacomo Boracchi
TL;DR
This work addresses online change detection in multivariate data streams under non-parametric assumptions and targeted false-alarm control. It introduces KQT-EWMA, which combines a Kernel-QuantTree histogram of the baseline distribution with an EWMA-based statistic to monitor the stream online, using Dirichlet-based Monte Carlo procedures to set thresholds that guarantee $ARL_0$ irrespective of the data distribution. The key contributions are a fully non-parametric online detector with pre-specified $ARL_0$, a detailed complexity analysis, and extensive experiments showing state-of-the-art detection delays, especially for complex distributions, while maintaining false-alarm control. The results have practical impact for real-time monitoring in industrial, network, and security applications where rapid detection and reliable false-alarm rates are essential.
Abstract
We present Kernel-QuantTree Exponentially Weighted Moving Average (KQT-EWMA), a non-parametric change-detection algorithm that combines the Kernel-QuantTree (KQT) histogram and the EWMA statistic to monitor multivariate data streams online. The resulting monitoring scheme is very flexible, since histograms can be used to model any stationary distribution, and practical, since the distribution of test statistics does not depend on the distribution of datastream in stationary conditions (non-parametric monitoring). KQT-EWMA enables controlling false alarms by operating at a pre-determined Average Run Length ($ARL_0$), which measures the expected number of stationary samples to be monitored before triggering a false alarm. The latter peculiarity is in contrast with most non-parametric change-detection tests, which rarely can control the $ARL_0$ a priori. Our experiments on synthetic and real-world datasets demonstrate that KQT-EWMA can control $ARL_0$ while achieving detection delays comparable to or lower than state-of-the-art methods designed to work in the same conditions.
