Table of Contents
Fetching ...

Sequential Change Detection in Correlation Structures with Window-Limited Statistics

Jie Gao, Liyan Xie, Zhaoyuan Li

TL;DR

This work considers detecting change points in the correlation structure of streaming data with minimum assumptions posed on the underlying data distribution and proposes a novel threshold determination algorithm based on sign-flip permutations that enhances the efficiency of the procedure, particularly when the data dimension is large compared to the window size.

Abstract

We consider detecting change points in the correlation structure of streaming data with minimum assumptions posed on the underlying data distribution. Detection statistics are constructed for dense and sparse change settings, based on $\ell_1$ and $\ell_{\infty}$ norms of the squared difference of vectorized pre- and post-change correlation matrices, respectively. We also propose a novel threshold determination algorithm based on sign-flip permutations that enhances the efficiency of our procedure, particularly when the data dimension is large compared to the window size. Theoretical guarantees of the proposed methods are provided in terms of average run length in the no-change regime and expected detection delay in the post-change regime. We evaluate the performance of the proposed methods across a wide range of simulated datasets and demonstrate their effectiveness, with small detection delays that are comparable to the exact optimal CUSUM test. Finally, we demonstrate the effectiveness of our methods on real-world datasets, including El Ni{ñ}o event forecasting, where we achieve a state-of-the-art hit rate exceeding 0.86 with near-zero false alarms, as well as seismic event detection.

Sequential Change Detection in Correlation Structures with Window-Limited Statistics

TL;DR

This work considers detecting change points in the correlation structure of streaming data with minimum assumptions posed on the underlying data distribution and proposes a novel threshold determination algorithm based on sign-flip permutations that enhances the efficiency of the procedure, particularly when the data dimension is large compared to the window size.

Abstract

We consider detecting change points in the correlation structure of streaming data with minimum assumptions posed on the underlying data distribution. Detection statistics are constructed for dense and sparse change settings, based on and norms of the squared difference of vectorized pre- and post-change correlation matrices, respectively. We also propose a novel threshold determination algorithm based on sign-flip permutations that enhances the efficiency of our procedure, particularly when the data dimension is large compared to the window size. Theoretical guarantees of the proposed methods are provided in terms of average run length in the no-change regime and expected detection delay in the post-change regime. We evaluate the performance of the proposed methods across a wide range of simulated datasets and demonstrate their effectiveness, with small detection delays that are comparable to the exact optimal CUSUM test. Finally, we demonstrate the effectiveness of our methods on real-world datasets, including El Ni{ñ}o event forecasting, where we achieve a state-of-the-art hit rate exceeding 0.86 with near-zero false alarms, as well as seismic event detection.

Paper Structure

This paper contains 30 sections, 5 theorems, 65 equations, 13 figures, 4 tables, 2 algorithms.

Key Result

Lemma 2.1

Assume ${\mathbb{E}} [x_{ki}] = 0$ and ${\mathbb{E}} [x_{ki}]^2= 1$ for $i=1,\ldots,p$, $\forall k$. For $\boldsymbol{v}_{1,t}(i,j) = (\hat{\rho}_0(i,j) - \hat{\rho}_{1:t}(i,j))^2$, we have where $\beta_{20} :=\mathbb{E}_\infty[(x_{ki}x_{kj})^2]$ is the expectation under the pre-change regime, and $\beta_{21} :=\mathbb{E}_1[(x_{ki}x_{kj})^2]$ is the expectation under the post-change regime.

Figures (13)

  • Figure 1: Comparison of ADDs of WL-Sum, WL-Sum+SMOTE, WL-Sum+Knockoff, and CUSUM under Normal distribution, Case 1 (dense level of change is 1) with varying $r$ values in $\{0.3,0.5,0.8\}$.
  • Figure 2: Comparison of ADDs of WL-Sum, WL-Sum+SMOTE, WL-Sum+Knockoff, and CUSUM under Normal distribution, Case 2 (dense level of change is 0.25) with $r=0.5$.
  • Figure 3: Comparison of ADDs of WL-Max, WL-Max+SMOTE, WL-Max+Knockoff and CUSUM under Normal distribution, Case 3 (dense level of change is $p^{-1.4}$).
  • Figure 4: Comparison of ADDs of WL-Sum, WL-Max and WL-Sum-Max-Combined under Normal distribution, with the dense level varying from 0.0006 to 0.1073. $\gamma=10^5$, $p=60$.
  • Figure 5: (a) The region of nodes; (b) The prediction of El Niño events between 1974 and 2024; (c) Comparison of ST-Sum test statistic and its corresponding SMOTE and knockoff enhancement versions; (d) Comparison of ST-Sum test statistic for different grid sizes.
  • ...and 8 more figures

Theorems & Definitions (12)

  • Lemma 2.1
  • Remark 2.2: The case with unknown mean
  • Remark 2.3
  • Remark 2.4: Information-theoretic lower bound
  • Remark 2.5: Effect of dimension $p$ on the detection procedure
  • Remark 3.1: The choice of window size
  • Remark 3.2
  • Lemma 3.3
  • Remark 3.4: Support Recovery
  • Lemma 4.1: Temporal correlation of sequential detection statistics
  • ...and 2 more