Table of Contents
Fetching ...

Spectral Phase Transition and Optimal PCA in Block-Structured Spiked models

Pierre Mergny, Justin Ko, Florent Krzakala

TL;DR

This work analyzes the inhomogeneous spiked Wigner model with block-structured noise and variance-profile Δ, introducing a transformed matrix $\tilde{Y}$ whose top eigenstructure reveals the hidden signal. The authors develop a rigorous spectral analysis based on a Quadratic Vector Equation (QVE) and anisotropic resolvent techniques to establish a BBP-type phase transition: the top eigenvalue detaches and the signal overlap becomes positive if and only if $\lambda_1(\boldsymbol{\Omega}_K) > 1$. They provide an explicit (though largely implicit) formula for the limiting overlap vector in terms of the QVE solution and show that PCA on $\tilde{Y}$ achieves the optimal detection threshold among iterative methods, bridging spectral methods and AMP approaches in the inhomogeneous setting. The results generalize classical homogeneous BBP phenomenology to block-structured noise, quantify the algorithmic vs statistical limits, and offer a rigorous pathway for spectral initialization in subsequent AMP algorithms. They also lay groundwork for extensions to higher-rank signals and broader variance-profile shapes, with potential applications to community detection and structured matrix estimation.

Abstract

We discuss the inhomogeneous spiked Wigner model, a theoretical framework recently introduced to study structured noise in various learning scenarios, through the prism of random matrix theory, with a specific focus on its spectral properties. Our primary objective is to find an optimal spectral method and to extend the celebrated \cite{BBP} (BBP) phase transition criterion -- well-known in the homogeneous case -- to our inhomogeneous, block-structured, Wigner model. We provide a thorough rigorous analysis of a transformed matrix and show that the transition for the appearance of 1) an outlier outside the bulk of the limiting spectral distribution and 2) a positive overlap between the associated eigenvector and the signal, occurs precisely at the optimal threshold, making the proposed spectral method optimal within the class of iterative methods for the inhomogeneous Wigner problem.

Spectral Phase Transition and Optimal PCA in Block-Structured Spiked models

TL;DR

This work analyzes the inhomogeneous spiked Wigner model with block-structured noise and variance-profile Δ, introducing a transformed matrix whose top eigenstructure reveals the hidden signal. The authors develop a rigorous spectral analysis based on a Quadratic Vector Equation (QVE) and anisotropic resolvent techniques to establish a BBP-type phase transition: the top eigenvalue detaches and the signal overlap becomes positive if and only if . They provide an explicit (though largely implicit) formula for the limiting overlap vector in terms of the QVE solution and show that PCA on achieves the optimal detection threshold among iterative methods, bridging spectral methods and AMP approaches in the inhomogeneous setting. The results generalize classical homogeneous BBP phenomenology to block-structured noise, quantify the algorithmic vs statistical limits, and offer a rigorous pathway for spectral initialization in subsequent AMP algorithms. They also lay groundwork for extensions to higher-rank signals and broader variance-profile shapes, with potential applications to community detection and structured matrix estimation.

Abstract

We discuss the inhomogeneous spiked Wigner model, a theoretical framework recently introduced to study structured noise in various learning scenarios, through the prism of random matrix theory, with a specific focus on its spectral properties. Our primary objective is to find an optimal spectral method and to extend the celebrated \cite{BBP} (BBP) phase transition criterion -- well-known in the homogeneous case -- to our inhomogeneous, block-structured, Wigner model. We provide a thorough rigorous analysis of a transformed matrix and show that the transition for the appearance of 1) an outlier outside the bulk of the limiting spectral distribution and 2) a positive overlap between the associated eigenvector and the signal, occurs precisely at the optimal threshold, making the proposed spectral method optimal within the class of iterative methods for the inhomogeneous Wigner problem.
Paper Structure (28 sections, 22 theorems, 63 equations, 2 figures)

This paper contains 28 sections, 22 theorems, 63 equations, 2 figures.

Key Result

Theorem 2.1

We have the following phase transition for the outlier and the overlap

Figures (2)

  • Figure 1: Eigenvalue Distribution and Top Eigenvalue Position for Different Value of $\lambda_1(\boldsymbol{\mathbf{\Omega}}_K)$. The three figures correspond to a model with $K=2$, $\boldsymbol{\mathbf{\rho}} = (1/2,1/2)$ and $\boldsymbol{\mathbf{S}}_K = {\smallt1/21/21/4}$ with a value of the parameter $t$ different in each figure and chosen such that (Left)$\lambda_1(\boldsymbol{\mathbf{\Omega}}_K) = 0.5 <1$, (Center)$\lambda_1(\boldsymbol{\mathbf{\Omega}}_K) = 1.0$ and (Right)$\lambda_1(\boldsymbol{\mathbf{\Omega}}_K) = 3.0 >1$. The black curve corresponds to the theoretical value of the limiting spectral distribution (LSD) obtained by solving the QVE of Eq. \ref{['eq:QVE']} numerically, while the colored histogram corresponds to the empirical distribution of a sample $\bm{\Tilde{Y}}/\sqrt{N}$ with $N=3000$, and the red triangle corresponds to the empirical value of its largest eigenvalue. The signal $\boldsymbol{\mathbf{x}}$ has been sampled from a standard normal distribution. Before the transition (left), the rightmost edge is below one and there is no outlier; at the transition (center), the rightmost edge touches the value one; and after the transition (right) there is an outlier at one.
  • Figure 2: Overlap Vector for Different Value of $\lambda_1(\boldsymbol{\mathbf{\Omega}}_K)$ and Different Models.(Left) Value of the square of the overlaps for a model with $K=2$, $\boldsymbol{\mathbf{\rho}} = ( 1/2, 1/2)$ and $\boldsymbol{\mathbf{S}}_K = {\smallt1/21/21/2}$ where the range of the parameter $t$ is set such that $\lambda_1(\boldsymbol{\mathbf{\Omega}}_K)$ varies from $0.5$ to $3.5$. (Right) Value of the square of the overlaps for a model with $K=2$, $\boldsymbol{\mathbf{\rho}} = (1/2, 1/2)$ and $\boldsymbol{\mathbf{S}}_K = {\small1tt1/2}$ where the range of the parameter $t$ is set such that $\lambda_1(\boldsymbol{\mathbf{\Omega}}_K)$ varies from $0.5$ to $3.5$. In both cases, the dots represent an average over $10$ samples of the empirical value of the overlaps for $N=3000$, with $\boldsymbol{\mathbf{x}}$ sampled from a standard Gaussian distribution.

Theorems & Definitions (23)

  • Theorem 2.1
  • Proposition 3.1: Quadratic Vector Equation for the Stieltjes
  • Proposition 3.2: Continuation on the real line
  • Lemma 3.3
  • Proposition 3.4
  • Corollary 3.5
  • Proposition 3.6: The rightmost edge is bounded by one
  • Lemma 3.7: Positivity and Monotonicity of $\boldsymbol{\mathbf{g}}(.)$ above the rightmost edge
  • Proposition 3.8
  • Proposition 3.9: Equation for outliers
  • ...and 13 more