An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM
Vijayalakshmi Saravanan, Perry Siehien, Shinjae Yoo, Hubertus Van Dam, Thomas Flynn, Christopher Kelly, Khaled Z Ibrahim
TL;DR
The study addresses real-time change point detection in streaming scientific data when distributional knowledge is limited. It introduces KCUSUM, a non-parametric, kernel-based extension of CUSUM that uses the Maximum Mean Discrepancy to compare incoming observations with a pre-change reference and derives ARL2FA and ESADD bounds via a random-walk interpretation of MMD. The authors provide theoretical results for KCUSUM, along with Monte Carlo demonstrations on molecular dynamics-type tasks and real datasets (e.g., NWChem CODAR and protein folding), showing a logarithmic relationship between detection delay and time to false alarm. This approach offers a practical, flexible framework for online change detection in high-volume simulations and non-Euclidean data, with potential extensions to broader kernel choices and real-world deployments.
Abstract
Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between sudden change detection and minimizing false alarms is vital. Many existing algorithms for this purpose rely on known probability distributions, limiting their feasibility. In this study, we introduce the Kernel-based Cumulative Sum (KCUSUM) algorithm, a non-parametric extension of the traditional Cumulative Sum (CUSUM) method, which has gained prominence for its efficacy in online change point detection under less restrictive conditions. KCUSUM splits itself by comparing incoming samples directly with reference samples and computes a statistic grounded in the Maximum Mean Discrepancy (MMD) non-parametric framework. This approach extends KCUSUM's pertinence to scenarios where only reference samples are available, such as atomic trajectories of proteins in vacuum, facilitating the detection of deviations from the reference sample without prior knowledge of the data's underlying distribution. Furthermore, by harnessing MMD's inherent random-walk structure, we can theoretically analyze KCUSUM's performance across various use cases, including metrics like expected delay and mean runtime to false alarms. Finally, we discuss real-world use cases from scientific simulations such as NWChem CODAR and protein folding data, demonstrating KCUSUM's practical effectiveness in online change point detection.
