Table of Contents
Fetching ...

Spectral Properties of Elementwise-Transformed Spiked Matrices

Michael J. Feldman

TL;DR

The paper analyzes the spectral properties of elementwise-transformed spiked matrices, proving a PCA phase transition in high dimensions when recovering a low-rank signal from $Y_n = n^{-1/2} f(\sqrt{n} X_n + Z_n)$. By expanding $f$ in an orthogonal polynomial basis with respect to the noise law $\mu$, it shows that the transformed data behave like a standard spiked model with an effective signal strength $\tau(f,\mu)$, yielding Marchenko–Pastur limiting spectra and explicit outlier behavior for the singular values and vectors; a zero-threshold case is handled via a higher-order $\ell_*$-theory. The work applies to nonlinear, discontinuous transforms and includes concrete applications to binomial data, ReLU activation, and data truncation, deriving practical thresholds and optimal preprocessing/shrinkage rules. Overall, it extends spiked matrix theory to nonlinear elementwise transforms, providing rigorous guidance for PCA in non-Gaussian, discrete, or truncated high-dimensional settings and informing data preprocessing strategies to improve signal recovery.

Abstract

This work concerns elementwise-transformations of spiked matrices: $Y_n = n^{-1/2} f( \sqrt{n} X_n + Z_n)$. Here, $f$ is a function applied elementwise, $X_n$ is a low-rank signal matrix, and $Z_n$ is white noise. We find that principal component analysis is powerful for recovering signal under highly nonlinear or discontinuous transformations. Specifically, in the high-dimensional setting where $Y_n$ is of size $n \times p$ with $n,p \rightarrow \infty$ and $p/n \rightarrow γ> 0$, we uncover a phase transition: for signal-to-noise ratios above a sharp threshold -- depending on $f$, the distribution of elements of $Z_n$, and the limiting aspect ratio $γ$ -- the principal components of $Y_n$ (partially) recover those of $X_n$. Below this threshold, the principal components of $Y_n$ are asymptotically orthogonal to the signal. In contrast, in the standard setting where $X_n + n^{-1/2}Z_n$ is observed directly, the analogous phase transition depends only on $γ$. A similar phenomenon occurs with $X_n$ square and symmetric and $Z_n$ a generalized Wigner matrix.

Spectral Properties of Elementwise-Transformed Spiked Matrices

TL;DR

The paper analyzes the spectral properties of elementwise-transformed spiked matrices, proving a PCA phase transition in high dimensions when recovering a low-rank signal from . By expanding in an orthogonal polynomial basis with respect to the noise law , it shows that the transformed data behave like a standard spiked model with an effective signal strength , yielding Marchenko–Pastur limiting spectra and explicit outlier behavior for the singular values and vectors; a zero-threshold case is handled via a higher-order -theory. The work applies to nonlinear, discontinuous transforms and includes concrete applications to binomial data, ReLU activation, and data truncation, deriving practical thresholds and optimal preprocessing/shrinkage rules. Overall, it extends spiked matrix theory to nonlinear elementwise transforms, providing rigorous guidance for PCA in non-Gaussian, discrete, or truncated high-dimensional settings and informing data preprocessing strategies to improve signal recovery.

Abstract

This work concerns elementwise-transformations of spiked matrices: . Here, is a function applied elementwise, is a low-rank signal matrix, and is white noise. We find that principal component analysis is powerful for recovering signal under highly nonlinear or discontinuous transformations. Specifically, in the high-dimensional setting where is of size with and , we uncover a phase transition: for signal-to-noise ratios above a sharp threshold -- depending on , the distribution of elements of , and the limiting aspect ratio -- the principal components of (partially) recover those of . Below this threshold, the principal components of are asymptotically orthogonal to the signal. In contrast, in the standard setting where is observed directly, the analogous phase transition depends only on . A similar phenomenon occurs with square and symmetric and a generalized Wigner matrix.
Paper Structure (11 sections, 24 theorems, 152 equations, 4 figures)

This paper contains 11 sections, 24 theorems, 152 equations, 4 figures.

Key Result

Lemma 1.1

(Theorem 3.6 of BS_SpAn, Theorems 2.8--2.10 of BGN12) In the asymmetric setting, let $Y_n \coloneqq X_n + n^{-1/2} Z_n$, where the elements of $Z_n$ have mean zero, variance one, and finite moments. The empirical spectral distribution (ESD) of $Y_n^\top Y_n$ converges almost surely weakly to the Mar where the biasing function $\lambda(\sigma, \gamma)$ is given by The limiting angles between the s

Figures (4)

  • Figure 1: The eigenvalues of the sample covariance of data in N08 plotted in increase order (left) and the histogram of eigenvalues compared to the Marchenko--Pastur distribution (right).
  • Figure 2: Cosine similarities between the singular vectors of $X_n$ and $Y_n$ under (\ref{['d3x']}), with $n = 5000$, $p = 2500$, $\gamma = 1/2$, and $m = 2$ (left) or $m = \lfloor \sqrt{n} \rfloor$ (right). The singular vectors of $X_n$ were generated uniformly on the unit sphere. There is close agreement between theory (solid lines) and simulations (points, each representing the average 25 simulations).
  • Figure 3: Left: Cosine similarities in the symmetric setting between the eigenvectors of $X_n$ and $X_n + n^{-1/2} Z_n$ (blue) and $X_n$ and $Y_n$ (orange). The elements of $Z_n$ have a bimodal distribution, $n = 5000$, and $Y_n = n^{-1/2} f^*(X_n + n^{-1/2} Z_n)$ where $f^*$ is the transformation introduced in Corollary \ref{['cor3']}. Application of $f^*$ reduces the recovery threshold of PCA from $1$ to roughly $.587$. Right: Cosine similarities between the singular vectors of $X_n$ and $Y_n$ with $f(z) = \max(z,0) - (2\pi)^{-1/2}$, the ReLU function, $n = 5000$, $p = 2500$, and $\gamma = 1/2$. This transformation increases the recovery threshold of PCA from $\gamma^{1/4} \approx .841$ to $\gamma^{1/4} \tau(f,\mu_\phi) \approx .982$. In both plots, there is close agreement between theory (solid lines) and simulations (points, each representing the average 25 simulations).
  • Figure 4: Left: cosine similarities between the left singular vectors of $X_n$ and $Y_n = n^{-1/2} f_c(\sqrt{n}X_n+Z_n)$ with Cauchy-distributed noise and $c = c^*$ (blue) and $c =1$ (orange). Cosine similarities between the singular vectors of $X_n$ and the raw data $X_n + n^{-1/2}Z_n$ are not plotted as they are $O(n^{-1/2})$ over the domain of this plot. There is close agreement between theory (solid lines) and simulations (points, each representing the average 25 simulations). Right: Under Cauchy-distributed noise, $\tau(f_c,\mu)$ is maximized at $c^* \approx 2.028$.

Theorems & Definitions (59)

  • Lemma 1.1
  • Remark 1.1
  • Remark 1.2
  • Remark 1.3
  • Lemma 1.2
  • Remark 1.4
  • Lemma 1.3
  • Lemma 1.4
  • Theorem 2.1
  • Corollary 2.2
  • ...and 49 more