Predictive change point detection for heterogeneous data
Anna-Christina Glock, Florian Sobieczky, Johannes Fürnkranz, Peter Filzmoser, Martin Jech
TL;DR
This work tackles online change point detection in time series with non-stationary, evolving trends. It introduces Predict and Compare (P&C), a framework that uses predictive models (e.g., ARIMA or LSTM) to forecast trend evolution and a CUSUM-based Compare step to flag deviations as change points, enabling online detection in heterogeneous data. The authors formalize P&C, instantiate it with ARIMA and LSTM predictors, apply a standardization step to remove linear trends, and compare performance against BFAST, Bayesian CPD, OCD, and classical CUSUM in tribology wear data, showing reduced false positives and competitive detection delays. The approach offers a flexible path to robust online CPD in settings with gradual, non-trivial trend changes, with potential applicability to manufacturing and condition-monitoring domains.
Abstract
A change point detection (CPD) framework assisted by a predictive machine learning model called "Predict and Compare" is introduced and characterised in relation to other state-of-the-art online CPD routines which it outperforms in terms of false positive rate and out-of-control average run length. The method's focus is on improving standard methods from sequential analysis such as the CUSUM rule in terms of these quality measures. This is achieved by replacing typically used trend estimation functionals such as the running mean with more sophisticated predictive models (Predict step), and comparing their prognosis with actual data (Compare step). The two models used in the Predict step are the ARIMA model and the LSTM recursive neural network. However, the framework is formulated in general terms, so as to allow the use of other prediction or comparison methods than those tested here. The power of the method is demonstrated in a tribological case study in which change points separating the run-in, steady-state, and divergent wear phases are detected in the regime of very few false positives.
