Table of Contents
Fetching ...

High-Dimensional Change Point Detection using Graph Spanning Ratio

Youngwen Sun, Katerina Papagiannouli, Vladimir Spokoiny

TL;DR

The paper develops a graph-spanning-ratio (GSR) framework for change-point detection in high-dimensional data, applicable to both offline and online settings and to data with unknown distributions. By projecting data through graph-based spanning distances and employing multiple scanning windows with bootstrap/permutation-based thresholds, the method achieves strong detection power, with a minimax separation rate scaling as $\sqrt{nd}$. Theoretical guarantees include type I error control and power bounds, extended to non-Gaussian data via Gaussian/Fisher approximations under mild conditions. Empirically, GSR outperforms competing methods across graph types and demonstrates timely online detection, with compelling validation on S&P 500 stock data.

Abstract

Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatile approach is applicable to Euclidean and graph-structured data with unknown distributions, while maintaining control over error probabilities. Theoretically, we demonstrate that the algorithm achieves high detection power when the magnitude of the change surpasses the lower bound of the minimax separation rate, which scales on the order of $\sqrt{nd}$. Our method outperforms other techniques in terms of accuracy for both Gaussian and non-Gaussian data. Notably, it maintains strong detection power even with small observation windows, making it particularly effective for online environments where timely and precise change detection is critical.

High-Dimensional Change Point Detection using Graph Spanning Ratio

TL;DR

The paper develops a graph-spanning-ratio (GSR) framework for change-point detection in high-dimensional data, applicable to both offline and online settings and to data with unknown distributions. By projecting data through graph-based spanning distances and employing multiple scanning windows with bootstrap/permutation-based thresholds, the method achieves strong detection power, with a minimax separation rate scaling as . Theoretical guarantees include type I error control and power bounds, extended to non-Gaussian data via Gaussian/Fisher approximations under mild conditions. Empirically, GSR outperforms competing methods across graph types and demonstrates timely online detection, with compelling validation on S&P 500 stock data.

Abstract

Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatile approach is applicable to Euclidean and graph-structured data with unknown distributions, while maintaining control over error probabilities. Theoretically, we demonstrate that the algorithm achieves high detection power when the magnitude of the change surpasses the lower bound of the minimax separation rate, which scales on the order of . Our method outperforms other techniques in terms of accuracy for both Gaussian and non-Gaussian data. Notably, it maintains strong detection power even with small observation windows, making it particularly effective for online environments where timely and precise change detection is critical.

Paper Structure

This paper contains 32 sections, 26 theorems, 179 equations, 8 figures, 5 tables, 4 algorithms.

Key Result

Theorem 2.2

Suppose that $Y_i$ satisfies the SubGaussian and $\mathbb{E}|Y_i^{\otimes 4}| < \infty$, then where

Figures (8)

  • Figure 1: Graph representation of a two-dimensional sequential data. Complete graphs, MST graphs, and NNG graphs are constructed from 60 i.i.d. normal distributed observations with first 30 observations (in orange) from standard normal, the second 30 observations (purple) with (upper row) change in mean, (lower row) change in variance.
  • Figure 2: The GSRs are calculated for a data with a dimensionality of $d = 300$ and a length of $2n = 100$, as $k$ varies within the range ${2, \ldots, 2n-2}$. The red dotted line corresponds to a dataset with a change point located at the midpoint, while the blue dotted line represents a dataset without a change point.
  • Figure 3: Mean gap-spanning distance $\Delta_\mu$ to ensure $1-\beta$ power in detecting mean change for window size $n=30, 32, 34$, and dimension $d=10$.
  • Figure 4: Detection power $P_{\text{mean}}$ for a mean change $\Delta = 1/\sqrt[3]{d}$ (top row) and a variance change $\Sigma = 2 I_d$ (bottom row), as a function of dimension and window length. Columns correspond to GSR$_{CG}$, GSR$_{MST}$, and GSR$_{NNG}$.
  • Figure 5: Change in graph connectivity from $p=1/2$ to $p=1/3$ ($\Delta p =1/6$). Purple nodes represent the graph data before change point, while oranges nodes represent the graph data after change point.
  • ...and 3 more figures

Theorems & Definitions (44)

  • Theorem 2.2: Bootstrap validity: online
  • Definition 2.3
  • Theorem 2.4: Power of the test
  • Proposition 2.5: $\left( \alpha,\beta \right)$ minimum radius
  • Theorem D.1
  • Lemma D.2: The Delta Method
  • Corollary D.3: Delta theorem for bootstrap
  • Theorem D.4: Bootstrap validity
  • proof
  • Theorem : Bootstrap validity: online
  • ...and 34 more