Sharp phase transitions in high-dimensional changepoint detection
Daniel Xiang, Chao Gao
TL;DR
The paper analyzes high-dimensional changepoint detection in a $p\times n$ Gaussian matrix, identifying a sharp minimax detection boundary $\rho^*(a,\beta)$ under the regime $\log\log n\sim a\log p$ and sparsity $s\sim p^{\beta}$. It develops an adaptive testing procedure based on a penalized Berk-Jones statistic across a geometrically growing grid of candidate changepoints, achieving the upper bound, while a Bayes/least-favorable prior construction yields matching lower bounds to establish the boundary between detectable and undetectable regions. The results are presented for both one-sided and two-sided changepoint alternatives and are connected to the Ingster--Donoho-- Jin boundary, Chan et al.’s multiple-changepoint work, and submatrix detection, with extensions to non-asymptotic rates and alternative regimes. The findings offer a precise phase diagram for when sparse, aligned changepoints can be detected in high-dimensional data, informing practical guidelines for offline changepoint analysis in multidimensional measurements.
Abstract
We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix $X \in \R^{p \times n}$ with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in $X$ exhibits a shift in mean at a common index between $1$ and $n$. We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions $n$ and $p$ to the signal sparsity and strength, the information-theoretic limits of this testing problem are characterized by a formula that determines whether or not there exists a testing procedure whose sum of Type I and II errors tends to zero as $n,p \to \infty$. The formula, called the \emph{detection boundary}, partitions the parameter space into a two regions: one where it is possible to detect the presence of a single aligned changepoint (detectable region), and another where no test is able to consistently distinguish the mean matrix from one with constant rows (undetectable region).
