Table of Contents
Fetching ...

Spectral analysis of spatial-sign covariance matrices for heavy-tailed data with dependence

Hantao Chen, Cheng Wang

TL;DR

The paper establishes the LSD and a central limit theorem for the spectral statistics of the self-normalized spatial-sign covariance matrix under α-regularly varying heavy-tailed data with general covariance Σ. By leveraging concentration properties of self-normalized random variables and precise quadratic-form analyses, it proves that the MP equation governs the LSD under α≥2 and that a CLT for linear spectral statistics holds under α>4, with explicit mean and covariance structures that depend on the tail index through τ=E[Z^4]. The results extend classical random-matrix theory to robust, self-normalized constructions in heavy-tailed, dependent settings, and are shown to be nearly optimal in tail requirements. These findings provide principled, robust spectral tools for high-dimensional multivariate analysis when data exhibit heavy tails and dependence, with practical implications for robust PCA and related methods.

Abstract

This paper investigates the spectral properties of spatial-sign covariance matrices, a self-normalized version of sample covariance matrices, for data from $α$-regularly varying populations with general covariance structures. By exploiting the elegant properties of self-normalized random variables, we establish the limiting spectral distribution and a central limit theorem for linear spectral statistics. We demonstrate that the Mar{uc}enko-Pastur equation holds under the condition $α\geq 2$, while the central limit theorem for linear spectral statistics is valid for $α>4$, which are shown to be nearly the weakest possible conditions for spatial-sign covariance matrices from heavy-tailed data in the presence of dependence.

Spectral analysis of spatial-sign covariance matrices for heavy-tailed data with dependence

TL;DR

The paper establishes the LSD and a central limit theorem for the spectral statistics of the self-normalized spatial-sign covariance matrix under α-regularly varying heavy-tailed data with general covariance Σ. By leveraging concentration properties of self-normalized random variables and precise quadratic-form analyses, it proves that the MP equation governs the LSD under α≥2 and that a CLT for linear spectral statistics holds under α>4, with explicit mean and covariance structures that depend on the tail index through τ=E[Z^4]. The results extend classical random-matrix theory to robust, self-normalized constructions in heavy-tailed, dependent settings, and are shown to be nearly optimal in tail requirements. These findings provide principled, robust spectral tools for high-dimensional multivariate analysis when data exhibit heavy tails and dependence, with practical implications for robust PCA and related methods.

Abstract

This paper investigates the spectral properties of spatial-sign covariance matrices, a self-normalized version of sample covariance matrices, for data from -regularly varying populations with general covariance structures. By exploiting the elegant properties of self-normalized random variables, we establish the limiting spectral distribution and a central limit theorem for linear spectral statistics. We demonstrate that the Mar{uc}enko-Pastur equation holds under the condition , while the central limit theorem for linear spectral statistics is valid for , which are shown to be nearly the weakest possible conditions for spatial-sign covariance matrices from heavy-tailed data in the presence of dependence.

Paper Structure

This paper contains 26 sections, 19 theorems, 201 equations, 3 figures.

Key Result

Proposition 2.1

If $Z_1,Z_2,\cdots$ are i.i.d. regularly varying with $\alpha\geq 2$, for fixed positive integer $r>0$, we have

Figures (3)

  • Figure 1: The ESDs of ${\bf B}$ with different values of varying index $\alpha\in\{0.5,1,2,4\}$ where $(n,p)=(400,200)$ and the Marc̆enko-Pastur law with $y=0.5$.
  • Figure 2: The infinite norm between ESDs and the generalized Marc̆enko-Pastur law with the growth of $\alpha$ in different settings of dimensions $(n,p)$.
  • Figure 3: The histograms of statistics $\hbox{tr}({\bf B}^2)$ and the theoretical CLT for $\alpha=4.5$ and $n=p$.

Theorems & Definitions (35)

  • Definition 2.1: $\alpha$-regularly varying distribution
  • Example : Student's $t$-distribution
  • Example : Pareto distribution
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Proposition 2.5
  • Proposition 2.6
  • Proposition 2.7
  • ...and 25 more