Sharp phase transitions in high-dimensional changepoint detection

Daniel Xiang; Chao Gao

Sharp phase transitions in high-dimensional changepoint detection

Daniel Xiang, Chao Gao

TL;DR

The paper analyzes high-dimensional changepoint detection in a $p\times n$ Gaussian matrix, identifying a sharp minimax detection boundary $\rho^*(a,\beta)$ under the regime $\log\log n\sim a\log p$ and sparsity $s\sim p^{\beta}$. It develops an adaptive testing procedure based on a penalized Berk-Jones statistic across a geometrically growing grid of candidate changepoints, achieving the upper bound, while a Bayes/least-favorable prior construction yields matching lower bounds to establish the boundary between detectable and undetectable regions. The results are presented for both one-sided and two-sided changepoint alternatives and are connected to the Ingster--Donoho-- Jin boundary, Chan et al.’s multiple-changepoint work, and submatrix detection, with extensions to non-asymptotic rates and alternative regimes. The findings offer a precise phase diagram for when sparse, aligned changepoints can be detected in high-dimensional data, informing practical guidelines for offline changepoint analysis in multidimensional measurements.

Abstract

We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix $X \in \R^{p \times n}$ with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in $X$ exhibits a shift in mean at a common index between $1$ and $n$. We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions $n$ and $p$ to the signal sparsity and strength, the information-theoretic limits of this testing problem are characterized by a formula that determines whether or not there exists a testing procedure whose sum of Type I and II errors tends to zero as $n,p \to \infty$. The formula, called the \emph{detection boundary}, partitions the parameter space into a two regions: one where it is possible to detect the presence of a single aligned changepoint (detectable region), and another where no test is able to consistently distinguish the mean matrix from one with constant rows (undetectable region).

Sharp phase transitions in high-dimensional changepoint detection

TL;DR

The paper analyzes high-dimensional changepoint detection in a

Gaussian matrix, identifying a sharp minimax detection boundary

under the regime

and sparsity

. It develops an adaptive testing procedure based on a penalized Berk-Jones statistic across a geometrically growing grid of candidate changepoints, achieving the upper bound, while a Bayes/least-favorable prior construction yields matching lower bounds to establish the boundary between detectable and undetectable regions. The results are presented for both one-sided and two-sided changepoint alternatives and are connected to the Ingster--Donoho-- Jin boundary, Chan et al.’s multiple-changepoint work, and submatrix detection, with extensions to non-asymptotic rates and alternative regimes. The findings offer a precise phase diagram for when sparse, aligned changepoints can be detected in high-dimensional data, informing practical guidelines for offline changepoint analysis in multidimensional measurements.

Abstract

We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix

with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in

exhibits a shift in mean at a common index between

and

. We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions

and

to the signal sparsity and strength, the information-theoretic limits of this testing problem are characterized by a formula that determines whether or not there exists a testing procedure whose sum of Type I and II errors tends to zero as

. The formula, called the \emph{detection boundary}, partitions the parameter space into a two regions: one where it is possible to detect the presence of a single aligned changepoint (detectable region), and another where no test is able to consistently distinguish the mean matrix from one with constant rows (undetectable region).

Paper Structure (31 sections, 13 theorems, 341 equations)

This paper contains 31 sections, 13 theorems, 341 equations.

Introduction
Related literature in changepoint detection
Organization of the paper
Notation
Main results
One-sided changepoint
Two-sided changepoint
Connections to related works
Related literature in sparse signal detection
Connection to Ingster--Donoho--Jin boundary
Connection to chan2015optimal
Minor technical points
Multiple changepoints
Connection to the non-asymptotic rate of liu2021minimax
Another asymptotic regime
...and 16 more sections

Key Result

Theorem 2.1

For $a > 0$ and $\beta_1 \in (0,1)$, let $\rho^* \coloneqq \rho_{\mathrm{1-side}}^*(a,\beta_1)$, and suppose that $s,n,p$ are related via 3log.

Theorems & Definitions (23)

Theorem 2.1
Theorem 2.2
Theorem 3.1: chan2015optimal
Theorem 3.2
Remark 3.1
Theorem 3.3
Remark 4.1
Lemma B.1
proof
Lemma B.2
...and 13 more

Sharp phase transitions in high-dimensional changepoint detection

TL;DR

Abstract

Sharp phase transitions in high-dimensional changepoint detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (23)