Table of Contents
Fetching ...

Eigenvector Overlaps of Random Covariance Matrices and their Submatrices

Elie Attal, Romain Allez

TL;DR

This work extends the study of eigenvector overlaps to rectangular Gaussian (Wishart) matrices by analyzing the dynamics of singular vectors under Brownian perturbations. It introduces three resolvents, S_V, S_U, and S_W, and shows that in the macroscopic limit they satisfy a deterministic coupled PDE system solved via the method of characteristics, enabling explicit inversion to obtain limiting mean-squared overlaps between the singular vectors of a full matrix and those of a macroscopic submatrix. The results cover general initial data A and specialize to the Marchenko-Pastur case A = 0, where the overlaps become explicit Cauchy-like functions. These findings provide precise, scalable formulas for PCA-related overlap quantities under subsampling or feature/ sample removal, with potential implications for incremental PCA, missing data, and covariance estimation in high-dimensional settings.

Abstract

We consider the singular vectors of any $m \times n$ submatrix of a rectangular $M \times N$ Gaussian matrix and study their asymptotic overlaps with those of the full matrix, in the macroscopic regime where $N \,/\, M\,$, $m \,/\, M$ as well as $n \,/\, N$ converge to fixed ratios. Our method makes use of the dynamics of the singular vectors and of specific resolvents when the matrix coefficients follow Brownian trajectories. We obtain explicit forms for the limiting rescaled mean squared overlaps for right and left singular vectors in the bulk of both spectra, for any initial matrix $A\,$. When it is null, this corresponds to the Marchenko-Pastur setup for covariance matrices, and our formulas simplify into Cauchy-like functions.

Eigenvector Overlaps of Random Covariance Matrices and their Submatrices

TL;DR

This work extends the study of eigenvector overlaps to rectangular Gaussian (Wishart) matrices by analyzing the dynamics of singular vectors under Brownian perturbations. It introduces three resolvents, S_V, S_U, and S_W, and shows that in the macroscopic limit they satisfy a deterministic coupled PDE system solved via the method of characteristics, enabling explicit inversion to obtain limiting mean-squared overlaps between the singular vectors of a full matrix and those of a macroscopic submatrix. The results cover general initial data A and specialize to the Marchenko-Pastur case A = 0, where the overlaps become explicit Cauchy-like functions. These findings provide precise, scalable formulas for PCA-related overlap quantities under subsampling or feature/ sample removal, with potential implications for incremental PCA, missing data, and covariance estimation in high-dimensional settings.

Abstract

We consider the singular vectors of any submatrix of a rectangular Gaussian matrix and study their asymptotic overlaps with those of the full matrix, in the macroscopic regime where , as well as converge to fixed ratios. Our method makes use of the dynamics of the singular vectors and of specific resolvents when the matrix coefficients follow Brownian trajectories. We obtain explicit forms for the limiting rescaled mean squared overlaps for right and left singular vectors in the bulk of both spectra, for any initial matrix . When it is null, this corresponds to the Marchenko-Pastur setup for covariance matrices, and our formulas simplify into Cauchy-like functions.
Paper Structure (18 sections, 155 equations, 1 figure)

This paper contains 18 sections, 155 equations, 1 figure.

Figures (1)

  • Figure 1: Comparison of our formulas for $\bar{V}\,,\bar{U}$ and $\bar{W}$ with numerical simulations of $N \, \mathbb{E}\left[V_{ij}(t)\right]$ (red plain curve for theory and red circles for data), $N \, \mathbb{E}\left[U_{ij}(t)\right]$ (blue plain curve for theory and blue triangles for data) and $N \, \mathbb{E}\left[W_{ij}(t)\right]$ (green plain curve for theory and green squares for data) for $M = 300\,, q = 0.9\,, \alpha = 0.4\,, \beta = 0.8$ and $t = 3\,$ as a function of $\lambda$ for a fixed $\mu = \mu(x,t)\,$. Left:$x = 0.9\,$. Middle:$x = 0.5\,$. Right:$x = 0.1\,$.

Theorems & Definitions (1)

  • proof