Reproduction of scan B-statistic for kernel change-point detection algorithm
Zihan Wang
TL;DR
The paper tackles online change-point detection with a distribution-free approach by reproducing and evaluating the kernel-based scan B-statistic (SBSK). It builds on the maximum mean discrepancy framework, using a B-test over reference blocks to form a standardized online statistic $$Z_{B_0,t}'$$ and a stopping rule $${Z_{B_0,t}'>b}$$ with thresholds derived from an ARL approximation. By comparing SBSK to Hotelling’s $$T^2$$ and GLR across diverse change scenarios, the study demonstrates that SBSK yields consistently superior detection performance, particularly in non-Gaussian settings, and shows that subsampling can modestly improve variance estimation and detection. The findings support the practical viability of SBSK for robust, online change-point detection in real-world data streams.
Abstract
Change-point detection has garnered significant attention due to its broad range of applications, including epidemic disease outbreaks, social network evolution, image analysis, and wireless communications. In an online setting, where new data samples arrive sequentially, it is crucial to continuously test whether these samples originate from a different distribution. Ideally, the detection algorithm should be distribution-free to ensure robustness in real-world applications. In this paper, we reproduce a recently proposed online change-point detection algorithm based on an efficient kernel-based scan B-statistic, and compare its performance with two commonly used parametric statistics. Our numerical experiments demonstrate that the scan B-statistic consistently delivers superior performance. In more challenging scenarios, parametric methods may fail to detect changes, whereas the scan B-statistic successfully identifies them in a timely manner. Additionally, the use of subsampling techniques offers a modest improvement to the original algorithm.
