Table of Contents
Fetching ...

Predictive change point detection for heterogeneous data

Anna-Christina Glock, Florian Sobieczky, Johannes Fürnkranz, Peter Filzmoser, Martin Jech

TL;DR

This work tackles online change point detection in time series with non-stationary, evolving trends. It introduces Predict and Compare (P&C), a framework that uses predictive models (e.g., ARIMA or LSTM) to forecast trend evolution and a CUSUM-based Compare step to flag deviations as change points, enabling online detection in heterogeneous data. The authors formalize P&C, instantiate it with ARIMA and LSTM predictors, apply a standardization step to remove linear trends, and compare performance against BFAST, Bayesian CPD, OCD, and classical CUSUM in tribology wear data, showing reduced false positives and competitive detection delays. The approach offers a flexible path to robust online CPD in settings with gradual, non-trivial trend changes, with potential applicability to manufacturing and condition-monitoring domains.

Abstract

A change point detection (CPD) framework assisted by a predictive machine learning model called "Predict and Compare" is introduced and characterised in relation to other state-of-the-art online CPD routines which it outperforms in terms of false positive rate and out-of-control average run length. The method's focus is on improving standard methods from sequential analysis such as the CUSUM rule in terms of these quality measures. This is achieved by replacing typically used trend estimation functionals such as the running mean with more sophisticated predictive models (Predict step), and comparing their prognosis with actual data (Compare step). The two models used in the Predict step are the ARIMA model and the LSTM recursive neural network. However, the framework is formulated in general terms, so as to allow the use of other prediction or comparison methods than those tested here. The power of the method is demonstrated in a tribological case study in which change points separating the run-in, steady-state, and divergent wear phases are detected in the regime of very few false positives.

Predictive change point detection for heterogeneous data

TL;DR

This work tackles online change point detection in time series with non-stationary, evolving trends. It introduces Predict and Compare (P&C), a framework that uses predictive models (e.g., ARIMA or LSTM) to forecast trend evolution and a CUSUM-based Compare step to flag deviations as change points, enabling online detection in heterogeneous data. The authors formalize P&C, instantiate it with ARIMA and LSTM predictors, apply a standardization step to remove linear trends, and compare performance against BFAST, Bayesian CPD, OCD, and classical CUSUM in tribology wear data, showing reduced false positives and competitive detection delays. The approach offers a flexible path to robust online CPD in settings with gradual, non-trivial trend changes, with potential applicability to manufacturing and condition-monitoring domains.

Abstract

A change point detection (CPD) framework assisted by a predictive machine learning model called "Predict and Compare" is introduced and characterised in relation to other state-of-the-art online CPD routines which it outperforms in terms of false positive rate and out-of-control average run length. The method's focus is on improving standard methods from sequential analysis such as the CUSUM rule in terms of these quality measures. This is achieved by replacing typically used trend estimation functionals such as the running mean with more sophisticated predictive models (Predict step), and comparing their prognosis with actual data (Compare step). The two models used in the Predict step are the ARIMA model and the LSTM recursive neural network. However, the framework is formulated in general terms, so as to allow the use of other prediction or comparison methods than those tested here. The power of the method is demonstrated in a tribological case study in which change points separating the run-in, steady-state, and divergent wear phases are detected in the regime of very few false positives.
Paper Structure (35 sections, 6 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 35 sections, 6 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Principle of P&C: The diagram shows CPs at the vertical dotted lines - it is standardized data from a tribological experiment about the wear occurring in a bearing (Sect. \ref{['subsec:03_zScore']}). From input data (blue) up to $t=5000$, an online prediction (red) is made starting at $t=5001$ and deviating from the trend of the data after the CP. This facilitates its subsequent detection by an online sequential statistical test. Note the data is heterogeneous, as different non-stationary trend patterns are concatenated.
  • Figure 2: Heterogeneous time series data with regimes of different characteristic trends (left). See \ref{['subsec:03_dataExp']}) for a detailed description of the tribological origin of the data. Also seen is a transformed version of the time series yielding a 'steady state' in one of the regimes (right), as described in \ref{['subsec:03_zScore']}. Detecting change points into and out of stationarity will be seen to be simpler and add to the power of the detection method.
  • Figure 3: An illustration of P&C on data with a change point (left) and without a change point (right). The orange data points are used as input ($I_t$) for a predictive model $\widehat{f}_t$, whose predictions (green points) are then compared to the real data (blue points) on the prediction interval ($J_t$). The grey points are not used for the predictor $\widehat{f}_t$.
  • Figure 4: The top three plots show artificial data samples used to test the Predict and Compare method. The second and third vertical dashed orange lines symbolize the CPs into regime K and regime A, respectively. Bottom: The plot shows the results of P&C with LSTM as the learning method applied to these two change points. The colors signify the respective strength of the signal-to-noise ratio from the diagrams above. Each point represents the averaged results of P&C over different parameter sets for one data sample. The x-axis is the difference between the labeled and the found change point. The false positive count (Fpc) is shown on the y-axis. Both averages are aggregated by using the mean.
  • Figure 5: his picture shows a bench test setup. On the left side, one can see the test bench. Below is the supply hydraulics and in the upper area, a wear test has been set up. From the wear test, two hoses run horizontally to the right and continue downwards. The lower hose transports the lubricant from the wear test to the RIC, located to the right, and the upper hose returns the lubricant to the wear test. On the right side, one can see the computer used to monitor the test. (Photograph taken by Dr. M. Jech, publication granted with courtesy of the Austrian Competence Centre for Tribology, AC$^2$T research GmbH)
  • ...and 6 more figures