Asymptotic limits of spiked eigenvalues and eigenvectors of signal-plus-noise matrices with weak signals and heteroskedastic noise
Xiaoyu Liu, Yiming Liu, Guangming Pan, Lingyue Zhang, Zhixiang Zhang
TL;DR
This work develops a comprehensive asymptotic theory for spiked eigenvalues and eigenvectors in signal-plus-noise models with heteroskedastic noise and a potentially growing-rank signal, in the regime $p/n\to c>0$. By establishing deterministic equivalents for resolvents and exact eigenvalue separation, the authors show that the spiked eigenvalues converge to $\varphi(\gamma_k)$ and the associated eigenvectors converge to deterministic projections, with distinct behavior for left and right singular vectors. They extend BBP-type phase-transition ideas to growing-rank signals and nontrivial noise covariance, and propose practical spectral-clustering tools, including EDA/EDB criteria (and pseudo variants for $c>1$) to estimate the number of clusters with strong consistency under gap conditions. The paper also demonstrates, via simulations, the robustness of the proposed criteria across distributions and noise structures and justifies the clustering power of the leading singular vectors. Overall, the results provide both theoretical insight and actionable methods for clustering in high dimensions under general noise covariances.
Abstract
This paper is to study a signal-plus-noise model in high dimensional settings when the dimension and the sample size are comparable. Specifically, we assume that the noise has a general covariance matrix that allows for heteroskedasticity, and that the deterministic signal has the same magnitude as the noise and can have a rank that tends to infinity. We develop the asymptotic limits of the left and right spiked singular vectors of the signal-plusnoise data matrix and the limits of the spiked eigenvalues of the corresponding Gram matrix. As an application, we propose a new criterion to estimate the number of clusters in clustering problems.
