Table of Contents
Fetching ...

Robust mean change point testing in high-dimensional data with heavy tails

Mengchu Li, Yudong Chen, Tengyao Wang, Yi Yu

TL;DR

This work addresses robust mean change point testing in high-dimensional data with heavy-tailed noise, introducing two tail classes, sub-Weibull $\mathcal{G}_{\alpha,K}$ and finite-moments $\mathcal{P}_{\alpha,K}$, and characterizing minimax rates across dense and sparse regimes. It develops dense testing via CUSUM-based statistics and sparse testing via sample-splitting and robust mean estimators (MoM and RSM), achieving near-optimal rates up to iterated logarithms and revealing tail-dependent phase transitions. The paper also presents adaptive procedures that achieve these rates without knowing the sparsity level, and extends the framework to multiple change points, temporal dependence, and scenarios with weaker moment conditions, including spacing-from-boundary benefits. Overall, it quantifies the costs of heavy-tailedness in high-dimensional change point testing and provides computationally feasible, robust tools with practical applicability in finance, neuroscience, and genomics where heavy tails are common.

Abstract

We study mean change point testing problems for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $\ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error distributions possess exponentially-decaying tails, a CUSUM-type statistic is shown to achieve a minimax testing rate up to $\sqrt{\log\log(8n)}$. As for polynomially-decaying tails, admitting bounded $α$-th moments for some $α\geq 4$, we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In the sparse regime, we further propose a computationally-efficient test to achieve optimality. Our investigation in the even more challenging case of $2 \leq α< 4$, unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e.\ testing sparse changes is information-theoretically as hard as testing dense changes. Finally, we consider various extensions where we also obtain near-optimal performances, including testing against multiple change points, allowing temporal dependence as well as fewer than two finite moments in the data generating mechanisms. We also show how sub-Gaussian rates can be achieved when an additional minimal spacing condition is imposed under the alternative hypothesis.

Robust mean change point testing in high-dimensional data with heavy tails

TL;DR

This work addresses robust mean change point testing in high-dimensional data with heavy-tailed noise, introducing two tail classes, sub-Weibull and finite-moments , and characterizing minimax rates across dense and sparse regimes. It develops dense testing via CUSUM-based statistics and sparse testing via sample-splitting and robust mean estimators (MoM and RSM), achieving near-optimal rates up to iterated logarithms and revealing tail-dependent phase transitions. The paper also presents adaptive procedures that achieve these rates without knowing the sparsity level, and extends the framework to multiple change points, temporal dependence, and scenarios with weaker moment conditions, including spacing-from-boundary benefits. Overall, it quantifies the costs of heavy-tailedness in high-dimensional change point testing and provides computationally feasible, robust tools with practical applicability in finance, neuroscience, and genomics where heavy tails are common.

Abstract

We study mean change point testing problems for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the -norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error distributions possess exponentially-decaying tails, a CUSUM-type statistic is shown to achieve a minimax testing rate up to . As for polynomially-decaying tails, admitting bounded -th moments for some , we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In the sparse regime, we further propose a computationally-efficient test to achieve optimality. Our investigation in the even more challenging case of , unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e.\ testing sparse changes is information-theoretically as hard as testing dense changes. Finally, we consider various extensions where we also obtain near-optimal performances, including testing against multiple change points, allowing temporal dependence as well as fewer than two finite moments in the data generating mechanisms. We also show how sub-Gaussian rates can be achieved when an additional minimal spacing condition is imposed under the alternative hypothesis.
Paper Structure (50 sections, 39 theorems, 355 equations, 3 figures, 4 tables)

This paper contains 50 sections, 39 theorems, 355 equations, 3 figures, 4 tables.

Key Result

Theorem 1

Let $0 < \alpha \leq 2$ and $K > 0$. For any $\varepsilon \in (0,1)$, there exist constants $C_1, C_2> 0$ depending only on $\alpha$, $K$ and $\varepsilon$, such that the test $\phi_{\mathcal{G}, \mathrm{dense}}$ defined in eq:test_sub_dense with satisfies as long as $\rho^2 \geq C_2 v_{\mathcal{G}, \mathrm{dense}}^{\mathrm{U}},$ where

Figures (3)

  • Figure 1: Minimax testing rate transition boundaries between dense and sparse regimes when the distribution of the error matrix belongs to $\mathcal{P}_{\alpha, K}^{\otimes}$ (left panel) and $\mathcal{G}_{\alpha, K}^{\otimes}$ (right panel). The left panel plots the curve $\gamma(\alpha) = (\alpha-2)^{-1} \wedge 1/2$ for $\alpha \in [2, \infty)$, and the two regimes are separated by $s^*_{\mathcal{P}} = p^{1/2-\gamma}$. The right panel plots the curve $\beta(\alpha) = 2/\alpha$ for $\alpha \in (0,2]$, and the two regimes are separated by $s^*_{\mathcal{G}} \asymp \sqrt{p}\log^{-\beta}(ep)$.
  • Figure 2: Simulation results for noise GG(2) (left panel) and GG(0.5) (right panel). The dashed line represents the 0.5 contour of the power.
  • Figure 3: Simulation results for noise Nt(8) (left panel) and Nt(3) (right panel). The dashed line represents the 0.5 contour of the power.

Theorems & Definitions (60)

  • Definition 1: Minimax testing rate
  • Definition 2: $\mathcal{G}_{\alpha, K}$ class of distributions
  • Definition 3: $\mathcal{P}_{\alpha, K}$ class of distributions
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Proposition 7
  • ...and 50 more