Table of Contents
Fetching ...

Asymptotic distribution of spiked eigenvalues in the large signal-plus-noise models

Zeqin Lin, Guangming Pan, Peng Zhao, Jia Zhou

Abstract

Consider large signal-plus-noise data matrices of the form $S + Σ^{1/2} X$, where $S$ is a low-rank deterministic signal matrix and the noise covariance matrix $Σ$ can be anisotropic. We establish the asymptotic joint distribution of its spiked singular values when the dimensionality and sample size are comparably large and the signals are supercritical under general assumptions concerning the structure of $(S, Σ)$ and the distribution of the random noise $X$. It turns out that the asymptotic distributions exhibit nonuniversality in the sense of dependence on the distributions of the entries of $X$, which contrasts with what has previously been established for the spiked sample eigenvalues in the context of spiked population models. Such a result yields the asymptotic distribution of the sample spiked eigenvalues associated with mixture models. We also explore the application of these findings in detecting mean heterogeneity of data matrices.

Asymptotic distribution of spiked eigenvalues in the large signal-plus-noise models

Abstract

Consider large signal-plus-noise data matrices of the form , where is a low-rank deterministic signal matrix and the noise covariance matrix can be anisotropic. We establish the asymptotic joint distribution of its spiked singular values when the dimensionality and sample size are comparably large and the signals are supercritical under general assumptions concerning the structure of and the distribution of the random noise . It turns out that the asymptotic distributions exhibit nonuniversality in the sense of dependence on the distributions of the entries of , which contrasts with what has previously been established for the spiked sample eigenvalues in the context of spiked population models. Such a result yields the asymptotic distribution of the sample spiked eigenvalues associated with mixture models. We also explore the application of these findings in detecting mean heterogeneity of data matrices.
Paper Structure (28 sections, 21 theorems, 369 equations, 2 figures, 4 tables)

This paper contains 28 sections, 21 theorems, 369 equations, 2 figures, 4 tables.

Key Result

Proposition 2.2

Under Assumptions assumption-bounded-moments-assumption-spiked-spacing, we have

Figures (2)

  • Figure 1: Comparison of spiked eigenvalues between signal-plus-noise model (Top) and spiked population model (Bottom). Parameters: 5,000 iterations, $M=200$, $N=400$, $\Sigma = I$ and $(S)_{i \mu} = \sqrt{5.25} \mathbbm{1} (i=1 \text{ and } \mu=1)$. Left: Standard Gaussian. Middle: $\mathbb{P}(x = \pm \sqrt{3}) = 1/6$ and $\mathbb{P}(x = 0) = 2/3$. Right: $\mathbb{P}(x = \pm 1/\sqrt{2}) = 4/9$ and $\mathbb{P}(x = \pm \sqrt{5}) = 1/18$. The latter two are tailored to match the first four moments of the standard Gaussian.
  • Figure 2: Distribution of statistics $\mathrm{DS}_4$ (Top) and $\mathrm{RS}_4$ (Bottom) under the null hypothesis (red) and alternative hypothesis (blue). Parameters: 5,000 iterations, $M=100, 200, 400$, $N=2M$, and $\Sigma = I$. The entries of $\mathbf{w}_\mu$ are i.i.d. from $N(0,1)$. In the simulation under $\mathrm{H}_1$, we specify $K = 2$ with $\mathbf{c}_1 = (1.5, 0, \cdots, 0)$ and $\mathbf{c}_2 = - \mathbf{c}_1$.

Theorems & Definitions (42)

  • Definition 2.1: stochastic domination
  • Proposition 2.2: large deviation bound
  • Definition 2.3: asymptotic quantities for Theorem \ref{['thm-spiked-distribution']}
  • Remark 2.4
  • Theorem 2.5: fluctuation of spiked eigenvalues
  • Remark 2.6
  • Remark 2.7
  • Remark 2.8
  • Remark 2.9
  • Definition 4.1
  • ...and 32 more