Table of Contents
Fetching ...

BBP Phase Transition for an Extensive Number of Outliers

Niklas Forner, Alexander Maloney, Bernd Rosenow

Abstract

Random-matrix theory helps disentangle signal from noise in large data sets. We analyze rectangular $p \times q$ matrices $W = W_0 + M$ in which the noise $M$ generates a Marchenko-Pastur bulk, whereas the signal $W_0$ injects an extensive set of degenerate singular values. Keeping $\mathrm{rank}$ $W_0/q$ finite as $p,q \to \infty$, we show that the singular value density obeys a quartic equation and derive explicit asymptotics in the strong-signal regime. The resulting generalized Baik-Ben Arous-Péché phase diagram yields a scaling law for the critical signal strength and clarifies how a finite density of spikes reshapes the bulk edges. Numerical simulations validate the theory and illustrate its relevance for high-dimensional inference tasks.

BBP Phase Transition for an Extensive Number of Outliers

Abstract

Random-matrix theory helps disentangle signal from noise in large data sets. We analyze rectangular matrices in which the noise generates a Marchenko-Pastur bulk, whereas the signal injects an extensive set of degenerate singular values. Keeping finite as , we show that the singular value density obeys a quartic equation and derive explicit asymptotics in the strong-signal regime. The resulting generalized Baik-Ben Arous-Péché phase diagram yields a scaling law for the critical signal strength and clarifies how a finite density of spikes reshapes the bulk edges. Numerical simulations validate the theory and illustrate its relevance for high-dimensional inference tasks.

Paper Structure

This paper contains 5 sections, 38 equations, 6 figures.

Figures (6)

  • Figure 1: Splitting of the spectrum for increasing signals at a fixed rank ratio. $\sigma = 1$, $\mathcal{A} = 2$, $\tilde{r} = 0.2$. The simulations for the spectrum of $W^{\top} W$ are conducted for $p \times q$ matrices $W = M + W_0$ with $p = \mathcal{A} q$ and $q = 1000$, averaged over a number of ten runs. Each entry in $M$ is drawn from a centered normal distribution with unit variance. The Marchenko-Pastur (MP) law acts as a reference for the noise-only case.
  • Figure 2: Phase diagram for $\mathcal{A} \in \{ 1, 2, 10 \}$, $\sigma = 1$. In the MP (Marchenko-Pastur) phase the spectrum exhibits one bulk, in the signal phase two -- the noise and signal bulk. Numerical procedure: for a fixed $\tilde{r}$, the signal is increased until the discriminant of Eq. \ref{['eq:self-consistency-with-sum']}, see Appendix, no longer gives two but four roots (or three at the exact phase transition). Each root corresponds to a boundary of the spectrum.
  • Figure 3: A zoom-in on the low-rank regime. By subtracting $\sigma^2 / \sqrt{\mathcal{A}}$ from the critical signals, we reveal the convergence to the BBP phase transition in the limit $\tilde{r} \to 0$ with a uniform scaling. The dashed lines are fits with rounded slopes $0.35$, $0.35$, and $0.34$ for $\mathcal{A} \in \{ 1, 2, 10 \}$ with rounded multiplicative constants $3.0 \cdot \mathcal{A}^{- 0.66}$. This suggests the scaling law $3 \mathcal{A}^{-2/3} \,\tilde{r}^{1/3}$.
  • Figure 4: Approximate spectral density by Eq. \ref{['eq:approximate-outlier-PDF']} in comparison with the exact spectrum and simulations for a $q = 500$ matrix averaged over 10 runs. $\mathcal{A} = 2$, $\sigma = 1$, $\tilde{r} = 0.2$, $\vartheta^2 = 10$.
  • Figure 5: Spectral density for five distinct signals $\{ \vartheta^2_{i} \}_{i = 1, \dotsc, 5} = \{ 5, 9, 10, 14, 20 \}$, each degenerate with a rank ratio $\tilde{r}_{i} = 1 / 500$. The exact solution is determined numerically via recursion of Eq. \ref{['eq:self-consistency-arbitrary-signal']}, as an approximate solution we add up the asymptotic density \ref{['eq:approximate-outlier-PDF']} for each signal separately, and the histogram stems from numerical simulations. We conduct 100 runs for a finite $p \times q$ dim. matrix with $q = 500$, $p = \mathcal{A} q$, and $\mathcal{A} = 2$.
  • ...and 1 more figures