Automatic Change-Point Detection in Time Series via Deep Learning

Jie Li; Paul Fearnhead; Piotr Fryzlewicz; Tengyao Wang

Automatic Change-Point Detection in Time Series via Deep Learning

Jie Li, Paul Fearnhead, Piotr Fryzlewicz, Tengyao Wang

TL;DR

The paper reframes offline change-point detection as a supervised learning task by training neural networks on labelled change/no-change examples, showing that standard tests like CUSUM can be represented within neural networks and that learned detectors can match or outperform them under model misspecification. It provides theoretical generalisation bounds that decompose error into the baseline test error plus a VC-dimension term, and shows that with practical training sizes, relatively small networks can achieve competitive performance. Empirically, the method performs on par with CUSUM under independent Gaussian noise and substantially better under autocorrelated or heavy-tailed noise, with successful applications to accelerometer-based activity changes. The work also demonstrates a scalable path to detecting and classifying multiple change-types using deep residual CNNs and presents a case study on real data, offering a flexible framework for automatic detector generation and deployment in diverse settings.

Abstract

Detecting change-points in data is challenging because of the range of possible types of change and types of behaviour of data when there is no change. Statistically efficient methods for detecting a change will depend on both of these features, and it can be difficult for a practitioner to develop an appropriate detection method for their application of interest. We show how to automatically generate new offline detection methods based on training a neural network. Our approach is motivated by many existing tests for the presence of a change-point being representable by a simple neural network, and thus a neural network trained with sufficient data should have performance at least as good as these methods. We present theory that quantifies the error rate for such an approach, and how it depends on the amount of training data. Empirical results show that, even with limited training data, its performance is competitive with the standard CUSUM-based classifier for detecting a change in mean when the noise is independent and Gaussian, and can substantially outperform it in the presence of auto-correlated or heavy-tailed noise. Our method also shows strong results in detecting and localising changes in activity based on accelerometer data.

Automatic Change-Point Detection in Time Series via Deep Learning

TL;DR

Abstract

Paper Structure (32 sections, 13 theorems, 46 equations, 15 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 13 theorems, 46 equations, 15 figures, 3 tables, 1 algorithm.

Introduction
Neural networks
CUSUM-based classifier and its generalisations are neural networks
Change in mean
Beyond the mean change model
Generalisation error of neural network change-point classifiers
Numerical study
Detecting multiple changes and multiple change types -- case study
Discussion
Proofs
The proof of \ref{['lem:CUSUMinNNet']}
The Proof of \ref{['lem:generaltest']}
The Proof of \ref{['lem:hypothesis_test']}
The Proof of \ref{['cor:Generalisation']}
Auxiliary Lemma
...and 17 more sections

Key Result

Lemma 3.1

For any $\lambda > 0$, we have $h^{\mathrm{CUSUM}}_\lambda(\boldsymbol{x}) \in \mathcal{H}_{1, 2n-2}$.

Figures (15)

Figure 1: A neural network with 2 hidden layers and width vector $\mathbf{m}=(4,4)$.
Figure 2: Plot of the test set MER, computed on a test set of size $N_{\mathrm{test}}=30000$, against training sample size $N$ for detecting the existence of a change-point on data series of length $n=100$. We compare the performance of the CUSUM test and neural networks from four function classes: $\mathcal{H}_{1,m^{(1)}}$,$\mathcal{H}_{1,m^{(2)}}$, $\mathcal{H}_{5,m^{(1)}\mathbf{1}_{5}}$ and $\mathcal{H}_{10,m^{(1)}\mathbf{1}_{10}}$ where $m^{(1)} = 4\lfloor\log_2(n)\rfloor$ and $m^{(2)} = 2n-2$ respectively under scenarios S1, S1$^{\prime}$, S2 and S3 described in \ref{['sec:Simulation_Study']}.
Figure 3: The sequence of accelerometer data in $x, y$ and $z$ axes. From left to right, there are 4 activities: "stair down", "stay", "stair up" and "walk", their change-points are 990, 1691, 2733 respectively marked by black solid lines. The grey rectangles represent the group of "no-change" with labels: "stair down", "stair up" and "walk"; The red rectangles represent the group of "one-change" with labels: "stair down$\to$stay", "stay$\to$stair up" and "stair up$\to$walk".
Figure 4: Change-point detection in HASC data. The red vertical lines represent the underlying change-points, the blue vertical lines represent the estimated change-points. More details on multiple change-point detection can be found in \ref{['sec:More_Details_of_Numerical_Study_and_Real_Data_Analysis']}.
Figure 5: Scenario S3 with Cauchy noise by adding Wilcoxon type of change-point detection method dehling2013changepoint and simple neural network with truncation in data preprocessing. The average misclassification error rate (MER) is computed on a test set of size $N_{\mathrm{test}}=15000$, against training sample size $N$ for detecting the existence of a change-point on data series of length $n=100$. We compare the performance of the CUSUM test, Wilcoxon test, $\mathcal{H}_{1,m^{(2)}}$ and $\mathcal{H}_{1,m^{(2)}}$ with $Z=3$ where $m^{(2)} = 2n-2$ and $Z=3$ means the truncated $z$-score, i.e. given vector $\boldsymbol{x}= ( x_{1},x_{2},\ldots,x_{n} )^{\top}$, then $x_{i}[{\left|x_{i}-\bar{x} \right|}>Z\sigma_{x}]=\bar{x}+\mathrm{sgn}(x_{i}-\bar{x})Z\sigma_{x}$, $\bar{x}$ and $\sigma_{x}$ are the mean and standard deviation of $\boldsymbol{x}$.
...and 10 more figures

Theorems & Definitions (23)

Lemma 3.1
Lemma 3.2
Lemma 4.1
Corollary 4.1
Theorem 4.2
Theorem 4.3
proof
proof
Lemma A.1
proof
...and 13 more

Automatic Change-Point Detection in Time Series via Deep Learning

TL;DR

Abstract

Automatic Change-Point Detection in Time Series via Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (23)