PCA recovery thresholds in low-rank matrix inference with sparse noise
Urte Adomaityte, Gabriele Sicuro, Pierpaolo Vivo
TL;DR
We address recovering a rank-one signal from a sparse symmetric noise matrix constructed from a configuration-model graph. The authors develop a replica analysis yielding recursive distributional equations that determine the typical top eigenvalue and the distribution of the top-eigenvector components, together with the spike–eigenvector overlap. They derive a sharp recovery threshold $\theta_{\rm crit}$ that generalizes the BBP transition to sparse noise and provide explicit formulas for Poissonian and random-regular degree distributions, with dense-connectivity limits recovering classical BBP results. Numerical diagonalisation corroborates the theoretical predictions, illustrating the phase transition between non-recovery and recovery regimes. The work advances understanding of PCA-like recovery under sparse noise and informs future directions in sparse PCA and related spectral methods.
Abstract
We study the high-dimensional inference of a rank-one signal corrupted by sparse noise. The noise is modelled as the adjacency matrix of a weighted undirected graph with finite average connectivity in the large size limit. Using the replica method from statistical physics, we analytically compute the typical value of the top eigenvalue, the top eigenvector component density, and the overlap between the signal vector and the top eigenvector. The solution is given in terms of recursive distributional equations for auxiliary probability density functions which can be efficiently solved using a population dynamics algorithm. Specialising the noise matrix to Poissonian and Random Regular degree distributions, the critical signal strength is analytically identified at which a transition happens for the recovery of the signal via the top eigenvector, thus generalising the celebrated BBP transition to the sparse noise case. In the large-connectivity limit, known results for dense noise are recovered. Analytical results are in agreement with numerical diagonalisation of large matrices.
