Table of Contents
Fetching ...

Sharp phase transitions in high-dimensional changepoint detection

Daniel Xiang, Chao Gao

TL;DR

The paper analyzes high-dimensional changepoint detection in a $p\times n$ Gaussian matrix, identifying a sharp minimax detection boundary $\rho^*(a,\beta)$ under the regime $\log\log n\sim a\log p$ and sparsity $s\sim p^{\beta}$. It develops an adaptive testing procedure based on a penalized Berk-Jones statistic across a geometrically growing grid of candidate changepoints, achieving the upper bound, while a Bayes/least-favorable prior construction yields matching lower bounds to establish the boundary between detectable and undetectable regions. The results are presented for both one-sided and two-sided changepoint alternatives and are connected to the Ingster--Donoho-- Jin boundary, Chan et al.’s multiple-changepoint work, and submatrix detection, with extensions to non-asymptotic rates and alternative regimes. The findings offer a precise phase diagram for when sparse, aligned changepoints can be detected in high-dimensional data, informing practical guidelines for offline changepoint analysis in multidimensional measurements.

Abstract

We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix $X \in \R^{p \times n}$ with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in $X$ exhibits a shift in mean at a common index between $1$ and $n$. We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions $n$ and $p$ to the signal sparsity and strength, the information-theoretic limits of this testing problem are characterized by a formula that determines whether or not there exists a testing procedure whose sum of Type I and II errors tends to zero as $n,p \to \infty$. The formula, called the \emph{detection boundary}, partitions the parameter space into a two regions: one where it is possible to detect the presence of a single aligned changepoint (detectable region), and another where no test is able to consistently distinguish the mean matrix from one with constant rows (undetectable region).

Sharp phase transitions in high-dimensional changepoint detection

TL;DR

The paper analyzes high-dimensional changepoint detection in a Gaussian matrix, identifying a sharp minimax detection boundary under the regime and sparsity . It develops an adaptive testing procedure based on a penalized Berk-Jones statistic across a geometrically growing grid of candidate changepoints, achieving the upper bound, while a Bayes/least-favorable prior construction yields matching lower bounds to establish the boundary between detectable and undetectable regions. The results are presented for both one-sided and two-sided changepoint alternatives and are connected to the Ingster--Donoho-- Jin boundary, Chan et al.’s multiple-changepoint work, and submatrix detection, with extensions to non-asymptotic rates and alternative regimes. The findings offer a precise phase diagram for when sparse, aligned changepoints can be detected in high-dimensional data, informing practical guidelines for offline changepoint analysis in multidimensional measurements.

Abstract

We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in exhibits a shift in mean at a common index between and . We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions and to the signal sparsity and strength, the information-theoretic limits of this testing problem are characterized by a formula that determines whether or not there exists a testing procedure whose sum of Type I and II errors tends to zero as . The formula, called the \emph{detection boundary}, partitions the parameter space into a two regions: one where it is possible to detect the presence of a single aligned changepoint (detectable region), and another where no test is able to consistently distinguish the mean matrix from one with constant rows (undetectable region).
Paper Structure (31 sections, 13 theorems, 341 equations)

This paper contains 31 sections, 13 theorems, 341 equations.

Key Result

Theorem 2.1

For $a > 0$ and $\beta_1 \in (0,1)$, let $\rho^* \coloneqq \rho_{\mathrm{1-side}}^*(a,\beta_1)$, and suppose that $s,n,p$ are related via 3log.

Theorems & Definitions (23)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 3.1: chan2015optimal
  • Theorem 3.2
  • Remark 3.1
  • Theorem 3.3
  • Remark 4.1
  • Lemma B.1
  • proof
  • Lemma B.2
  • ...and 13 more